Methods and materials for identifying polymorphic variants, diagnosing susceptibilities, and treating disease

ABSTRACT

The invention is directed to materials and methods associated with polymorphic variants in two enzymes involved in folate-dependent and one-carbon metabolic pathways: MTHFD1 (5,10-methylenetetrahydrofolate dehydrogenase, 5,10-methenyltetrahydrofolate cyclohydrolase, 10-formyltetrahydrofolate synthetase) and methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like (MTHFD1L). Diagnostic and therapeutic methods are provided involving the correlation of polymorphic variants in MTHFD1, MTHFD1, and other genes with relative susceptibility for various pregnancy-related and other complications.

CROSS REFERENCE TO RELATED APPLICATIONS

This patent application is a divisional of copending U.S. patent application Ser. No. 11/958,126, filed Dec. 17, 2007, which is a continuation-in-part of International Patent Application No. PCT/US2005/021288, filed Jun. 16, 2005, which are incorporated by reference herein.

INCORPORATION-BY-REFERENCE OF MATERIAL ELECTRONICALLY FILED

Incorporated by reference in its entirety herein is a computer-readable nucleotide/amino acid sequence listing submitted concurrently herewith and identified as follows: One 493,113 Byte ASCII (Text) file named “707465ST25.TXT,” created on Jan. 7, 2011.

BACKGROUND OF THE INVENTION

An important enzyme involved in one carbon metabolism is the NADP− dependent trifunctional enzyme MTHFD1 (5,10-methylenetetrahydrofolate dehydrogenase, 5,10-methenyltetrahydrofolate cyclohydrolase, 10-formyltetrahydrofolate synthetase) [Hum et al., J. Biol. Chem., 263:15946-15950 (1988)]. MTHFD1 is often referred to as the “C1-THF synthase” and catalyses the interconversion of tetrahydrofolate to 10-formyl, 5,10-methenyl, and 5,10-methylene derivatives. These derivatives form an important part of de novo DNA synthesis. Promotion of DNA synthesis is desirable in placental and fetal development. In other contexts, such as cancer treatment, blocking of DNA synthesis is desirable. Maternal folate status and/or homocysteine levels have been implicated in a range of pregnancy-related complications, most notably in pregnancies affected by a neural tube defect (NTD).

A polymorphic variant at position 1958 of MTHFD1 at which guanine is replaced with an adenosine results in the substitution of a conserved arginine amino acid with a glutamine at position 653. One study has disclosed that this polymorphic variant is a maternal risk factor for neural tube defects (NTDs) [Brody et al., Am. J. Hum. Genet., 71:1207-1215 (2002)]. Neural tube defects (NTDs) are common congenital malformations that can be presented as anencephaly, encephalocele, and spina bifida. NTDs' etiology likely includes both genetic and environmental factors. Intervention trials have shown that maternal supplementation with folic acid in the period before pregnancy can prevent the majority of NTD-affected pregnancies.

Abruptio placentae or placental abruption is thought to arise from a sudden rupture of the spiral arteries, resulting in the premature separation of a normally implanted placenta [Anath et al., Obstet. Gynecol., 88:309-318 (1996); Eskes, Eur. J. Obstet. Gyn. R. B., 95:206-212 (2001)]. This event leads to increased risk of adverse outcomes to both mother and baby. The underlying cause of abruptio placentae is unknown, but several factors have been suggested to increase risk including folate deficiency, hyperhomocysteinemia, preeclampsia and history of a prior pregnancy abruption [Kramer et al., Obstet. Gynecol., 89:221-226 (1997); Misra et al., J. Clin. Epidemiol., 52:453-461 (1999); Ray et al., Placenta, 20:519-529 (1999); Eskes, Eur. J. Obstet. Gyn. R. B., 95:206-212 (2001).] Non-genetic risk factors have been described including cigarette smoking, preeclampsia and increased maternal age [Misra et al., J. Clin. Epidemiol., 52:453-461 (1999); Eskes, Eur. J. Obstet. Gyn. R. B., 95:206-212 (2001)]. Additional risk factors include elevated homocysteine [Goddijn-Wessel et al., Br. Med. J., 2:1431-1436 (1996); van der Molen, et al., Am. J. Obstet. Gynecol., 182:1258-1263 (2000)] and low folate levels [Hibbard et al., Br. Med. J., 2:1431-1436 (1963); Streiff et al., N. Engl. J. Med., 276:776-779 (1967); Whalley et al., Am. J. Obstet. Gynecol., 105:670-678 (1969); Hibbard, S. Afr. Med. J., 49:1223-1226 (1975); Goddijn-Wessel et al., Br. Med. J., 2:1431-1436 (1996)].

A substantial proportion (15-50%) of second trimester pregnancy losses remain unexplained [Gaillard et al., Arch. Pathol. Lab. Med, 117:1022-1026 (1993); Faye-Petersen et al., Obstet. Gynecol., 94:915-920 (1999); Incerpi et al., Am. J. Obstet. Gynecol., 178:1121-1125 1998); Drakeley et al., Hum. Reprod., 13:1975-1980 (1998)]. Although placental insufficiency is a common finding in these cases [Faye-Petersen et al., Obstet. Gynecol., 94:915-920 (1999)], its etiology is often unknown. Sub-optimal folate or B₁₂ metabolism due to either a deficient diet or a genetic predisposition appears to increase the risk of a number of pregnancy complications including spontaneous abortion.

Polymorphisms have been studied in the context of a variety of cancers and other diseases. [See, e.g., Chen, et al., Int. J. Cancer, 110:617-620 (2004), Krajinovic et al., Pharmacogenomics J., 4:66-72 (2004); U.S. Pat. Nos. 5,449,605; 5,688,647; 5,719,026; 5,942,390; 6,294,399; 6,312,898; 6,537,759; 6,548,245; 6,627,401; 6,664,062; 6,759,200; 6,818,758; 6,833,243, and 6,872,533; and U.S. Patent Application Publication Nos: 2005/0084849; 2005/0089905; 2005/0095593; and 2005/0112680.

Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like (MTHFD1L) is a trifunctional enzyme localized to mitochondria which has been reported to have one or more enzymatic activities in common with MTHFD1. MTHFD1L has been shown to be transcribed into two mRNA transcripts: 1.1 kb and 3.6 kb in size. The shorter transcript is produced by splicing Exon7 with alternative Exon8A and therefore it lacks the 10-formyltetrahydrofolate synthase (synthetase) sequence [Prasannan et al., J. Biol. Chem., 278(44):43178-43187 (2003)].

Methylenetetrahydrofolate reductase (MTHFR) is involved in the remethylation of homocysteine to methionine by generating the necessary methyl group donor 5-methyltetrahydrofolate from 5,10-methylenetetrahydrofolate. Polymorphisms within MTHFR have been extensively studied in relation to a wide variety of diseases including NTDs, cancer and pre-eclampsia. The MTHFR polymorphic variant at 677, where a C is replaced with a T, has a functional effect on the MTHFR enzyme and is associated with elevated plasma homocysteine levels when folate status is low. A relatively small number of studies have examined the MTHFR polymorphisms in relation to abruptio placentae and results have not yielded a clear indication of increased risk for the disorder.

Coagulation factor II is proteolytically cleaved to form thrombin in the first step of the coagulation cascade, which ultimately results in the stemming of blood loss. F2 also plays a role in maintaining vascular integrity during development and postnatal life. Mutations in F2 can lead to various forms of thrombosis and dysprothrombinemia.

Coagulation factor V (F5) is an essential factor of the blood coagulation cascade, and circulates in plasma, and is converted to the active form by the release of the activation peptide by thrombin during coagulation. Once activated, factor V is a cofactor that participates with activated coagulation factor X to activate prothrombin to thrombin. Defects in this gene result in either an autosomal recessive hemorrhagic diathesis or an autosomal dominant form of thrombophilia, which is known as activated protein C (APC) resistance. A variant of factor V with a particular single point mutation associated with APC resistance is known as factor V Leiden [Bertina et al., Nature, 369:64-67 (1994)].

Transcobalamin II (TCNII) is a member of the vitamin B₁₂-binding protein family. TCNII binds cobalamin and mediates the transport of cobalamin into cells. TCNII polymorphic variant 776C>G (P259R) has been reported to confer an increased fetal genetic risk of early spontaneous abortion [Zetterberg et al., Hum. Reprod., 17:3033-3036 (2002)] and influence levels of circulating vitamin B₁₂ bound to TCNII [Afman et al., Eur. J. Hum. Genet., 10:433-438 (2002), Miller et al., Blood, 100:718-720 (2002)]. TCNII 776C>G polymorphic variant may interact with the MTHFR 677TT genotype to confer an even higher fetal genetic risk of spontaneous abortion than either polymorphism separately [Zetterberg et al., Hum. Reprod., 18:1948-1950 (2003)].

Chemotherapy of cancer has involved use of highly toxic drugs with narrow therapeutic indices, and, most adult solid cancers remain highly resistant to treatment. Chemotherapy often results in a significant fraction of treated patients suffering unpleasant or life-threatening side effects while receiving little or no clinical benefit; other patients may suffer few side effects and/or have complete remission or even cure. Chemotherapy is also expensive, based not only on the cost of drugs, but the medical care involved with their administration. Tests are needed that better predict chemotherapy efficacy; such tests would allow for more selective use of toxic drugs. In those cases where toxicity of chemotherapy or other drug regemin is at least partially a result of genetic differences, the identification of relevant polymorphic variants will allow for more effective and safer drug use.

Accordingly, to better diagnose and treat pregnancy complications and neoplastic disorders, cardiovascular disorders, Alzheimer's disease and other conditions associated with one carbon metabolic pathways, there is a need to identify polymorphic variants that indicate a relative susceptibility to such diseases and/or relative response to treatments for such diseases.

BRIEF SUMMARY OF THE INVENTION

The invention provides a method of screening for an increased susceptibility for at least one pregnancy-related complication selected from the group consisting of second trimester miscarriage and placental abruption. A sample from a subject is screened to detect the presence or absence of a polymorphic variant of a polymorphism in at least one chromosomal copy of the MTHFD1 gene, wherein the polymorphic variant is associated with an increased susceptibility for at least one pregnancy-related complication selected from the group consisting of second trimester miscarriage and placental abruption. The susceptibility of the subject for at least one pregnancy-related complication selected from the group consisting of second trimester miscarriage and placental abruption is diagnosed based on the presence or absence of the polymorphic variant of at least one chromosomal copy of the MTHFD1 gene.

The invention provides a method of testing for an increased susceptibility for a complication related to a defect in a one-carbon metabolic pathway. A sample from a subject is screened to detect the presence or absence of a polymorphic variant of a polymorphism in at least one chromosomal copy of the MTHFD1L gene, wherein the polymorphic variant is associated with an increased susceptibility for a complication related to a defect in a one-carbon metabolic pathway. The susceptibility of the subject for a complication related to a defect in a one-carbon metabolic pathway is diagnosed based on the presence or absence of the polymorphic variant of at least one chromosomal copy of the MTHFD1L gene.

The invention provides a kit. The kit includes a nucleic acid comprising at least 30 nucleotides of SEQ ID NO: 2 or a complement thereof, the nucleic acid further comprising the sequence of a polymorphic variant associated with an increased susceptibility for at least one pregnancy-related complication selected from the group consisting of second trimester miscarriage, placental abruption, and severe placental abruption. The kit includes instructions for screening a sample from a subject using the nucleic acid. The kit further includes instructions for diagnosing an increased susceptibility for one of said pregnancy-related complications if the polymorphic variant of at least one chromosomal copy of the MTHFD1 gene is detected in the sample.

The invention provides a kit. The kit includes a nucleic acid comprising at least 30 nucleotides of SEQ ID NO: 12 or a complement thereof, the nucleic acid further comprising the sequence of a polymorphic variant associated with an increased susceptibility for at least one complication related to a defect in a one-carbon metabolic pathway. The kit includes instructions for screening a sample from a subject using the nucleic acid. The kit further includes instructions for diagnosing an increased susceptibility for a complication related to a defect in a one-carbon metabolic pathway if the polymorphic variant of at least one chromosomal copy of the MTHFD1L gene is detected in the sample.

In addition to the foregoing, the invention includes, as an additional aspect, all embodiments of the invention narrower in scope in any way than the variations specifically mentioned above. Although the applicant(s) invented the full scope of the claims appended hereto, the claims appended hereto are not intended to encompass within their scope the prior art work of others. Therefore, in the event that statutory prior art within the scope of a claim is brought to the attention of the applicants by a Patent Office or other entity or individual, the applicant(s) reserve the right to exercise amendment rights under applicable patent laws to redefine the subject matter of such a claim to specifically exclude such statutory prior art or obvious variations of statutory prior art from the scope of such a claim. Variations of the invention defined by such amended claims also are intended as aspects of the invention.

DETAILED DESCRIPTION OF THE INVENTION Determination of Polymorphisms

This invention involves one or more polymorphic variants useful in the field of diagnostics and therapeutics for optimizing efficacy and safety of drug therapy for specific diseases or conditions and for establishing diagnostic tests for pregnancy-related and other complications affected by one carbon metabolic pathways. Methods are presented for identifying polymorphic variants and determining their utility in diagnostic and therapeutic methods, along with probes, kits, and related materials that are useful, for example, in identifying the presence and genotype of a particular polymorphic variant in an individual.

In identifying new correlations between polymorphic variants and disease susceptibilities and treatment approaches, different population groups based on racial, ethnic, gender, and/or geographic origin can be studied. Individuals with a particular disease or condition of interest or altered relative susceptibility thereto can have a higher frequency of certain polymorphic variants than the general population. The polymorphic variants can be predictive of differential, increased or decreased, susceptibility to various disease states, conditions, and complications, independent of ethnicity, race, or geographic origin, even if the polymorphic variant and disease association was originally identified in a particular population, for example, European, Celtic, and Irish populations. Distributions for some of the polymorphic variants are discussed herein.

“Differential” or “differentially” generally refers to a statistically significant different level in the specified property or effect. Preferably, the difference is also functionally significant. “Differential binding or hybridization” is a sufficient difference in binding or hybridization to allow discrimination using an appropriate detection technique. “Differential effect” or “differentially active” in connection with a therapeutic treatment or drug refers to a difference in the level of the effect or activity which is distinguishable using relevant parameters and techniques for the effect or activity being considered. In some embodiments, the difference in effect or activity is also sufficient to be clinically significant, such that a corresponding difference in the course of treatment or treatment outcome would be expected, at least on a probabilistic basis.

“Population” refers to a geographically, ethnically, racially, gender, and/or culturally defined group of individuals or a group of individuals with a particular disease or condition or individuals that may be treated with a specific drug. In most cases a population will preferably encompass at least one hundred, one thousand, ten thousand, one hundred thousand, one million, ten million, or more individuals, with the larger numbers being more preferable. In some embodiments, the population refers to individuals with relative susceptibility to a specific disease or condition and/or amenability to a particular drug regimen. The frequency of one or more polymorphic variants that is predictive of a differential susceptibility to a disease response and/or a response to a particular treatment is determined in one or more populations using a diagnostic test.

Nucleic acid samples, for use in polymorphic variant identification, can be obtained from a variety of sources as known to those skilled in the art, or can be obtained from genomic or cDNA sources by known methods. For example, the Coriell Cell Repository (Camden, N.J.) maintains over 6,000 human cell cultures, mostly fibroblast and lymphoblast cell lines comprising the NIGMS Human Genetic Mutant Cell Repository. A catalog (http://locus.umdnj.edu/nigms) provides racial or ethnic identifiers for many of the cell lines. Cell lines may also be obtained from the Beijing Cancer Institute.

“Allele frequency” is the fraction of genes in a population that have one specific polymorphic variant or set of polymorphic variants. The allele frequencies for any gene should sum to 1. In some embodiments, a polymorphic variant has an allele frequency of at least 0.001, 0.01, 0.05, or 0.10. Another measure of frequency known in the art is the “heterozygote frequency” namely, the fraction of individuals in a population who carry two alleles, or two forms of a particular polymorphic variant or variant form of a gene, one inherited from each parent. Alternatively, the number of individuals who are homozygous for a particular form of a gene may be a useful measure. The relationship between allele frequency, heterozygote frequency, and homozygote frequency is described for many genes by the Hardy-Weinberg equation. Most human polymorphic variants are substantially in Hardy-Weinberg equilibrium. The allele frequency, heterozygote frequency, or homozygote frequency can be determined experimentally.

To establish the association between a specific condition and one or more polymorphic variants, a study is commonly performed in controlled clinical trials using a limited number of patients that are considered to be representative of the population with the disease or relative susceptibility for the same. The populations should preferably be large enough to have a reasonable chance to find correlations between a particular genetic variant and susceptibility to the disease of interest. In addition, the allele frequency of the genetic variant in a population or subpopulation with the disease or pathology should vary from its allele frequency in the population without the disease pathology (control population) by at least 1%, by at least 2%, by at least 4%, or by at least 8%.

The association between case-control status and genotype can be examined using a number of standard odds ratios. In order to have a common approach for all analyses, a log linear model can be employed. The statistical software (SAS PROC NLMIXED) allows estimation of nonlinear functions of the parameters of the model, and provides standard errors calculated using the delta method [Agresti, Categorical Data Analysis (1990)]. The parameterization of the model can easily be modified for the computation of different odds ratios. This approach enables the researcher to estimate log odds ratios and their standard errors for the computation of confidence intervals, as well as to check the goodness of fit of different models. Potential gene-gene interaction effects can also be examined. Tests of interactive dominant or recessive effects of specific combined genotypes can be performed using a series of non-hierarchical logistic regression models [Piegorsch et al., Stat. Med., 13:153-162 (1994)]. Statistical significance can be assessed using likelihood ratio chi-square tests.

The polymorphism variant(s) showing the strongest correlation with an altered relative susceptibility for a disease state within a given gene are likely either to have a causative role in the manifestation of the phenotype or to be in linkage disequilibrium with the causative variants. Such a role can be confirmed by in vitro gene expression of the variant gene or by producing a transgenic animal expressing a human gene bearing such a polymorphic variant and determining whether the animal develops a relevant disease. Polymorphic variants in coding regions that result in amino acid changes can change relative susceptibility for a disease state by decreasing, increasing, or otherwise altering the activity of the protein encoded by the gene in which the polymorphism occurs. Polymorphic variants in coding regions that introduce stop codons can change relative susceptibility for a disease state by reducing (heterozygote) or eliminating (homozygote) functional protein produced by the gene. In some embodiments, stop codons result in production of a truncated peptide with aberrant activities relative to the full-length protein. Polymorphisms in regulatory regions can change relative susceptibility for a disease state by causing increased or decreased expression of the protein encoded by the gene in which the polymorphism occurs. Polymorphic variants in intronic or untranslated sequences can change relative susceptibility for a disease state either through the same mechanism as polymorphic variants in regulatory sequences or by causing altered splicing patterns resulting in an altered protein.

Types of Polymorphisms

As used herein, a “gene” is a sequence of DNA present in a cell that directs the expression of a “gene product,” most commonly by transcription to produce RNA and translation to produce protein. An “allele” is a particular form of a gene. The term allele is relevant when there are two or more forms of a particular gene. Genes and alleles are not limited to the open reading frame of the genomic sequence or the cDNA sequence corresponding to processed RNA. A gene and allele can also include sequence upstream and downstream of the genomic sequence such as promoters and enhancers. The terms “gene product,” or “polymorphic variant allele product” refer to a product resulting from transcription of a gene. Gene and polymorphic variant allele products include partial, precursor, and mature transcription products such as pre-mRNA and mRNA, and, translation products with or without further processing including, without limitation, lipidation, phosphorylation, glycosylation, other modifications known in the art, and combinations of such processing. RNA may be modified without limitation by complexing with proteins, polyadenylation, splicing, capping or export from the nucleus.

A “polymorphism” is a site in the genome that varies between two or more individuals or within an individual in the case of a heterozygote. The frequency of the variation can be defined above a specific value for inclusion of variations generally observed in a population as opposed to random mutations. Polymorphisms that can be screened according to the invention include variation both inside and outside the open reading frame. When outside the reading frame the polymorphism can occur within 200, 500, 1000, 2000, 3000, 5000, or more nucleotides of either the 5′ or 3′ end of the open reading frame. When inside the reading frame, the polymorphism may occur within an exon or intron, or overlapping an exon/intron boundary. A polymorphism could also overlap the open reading frame and sequence outside of that frame. Many polymorphisms have been given a “rs” designation in the SNP database of NCBI's Entrez, some of these designations have been provided herein for the polymorphisms that can be screened according to the invention.

A “polymorphic variant” is a particular form or embodiment of a polymorphism. For example if the polymorphism is a single nucleotide polymorphism, a particular variant could potentially be an “A” (adenosine), “G” (guanine), “T” (thymine), and “C” (cytosine). When the variant is a “T”, it is understood that a “U” can occur in those instances wherein the relevant nucleic acid molecule is RNA, and vice versa in respect to DNA. The convention “PositionNUC1>NUC2” is used to indicate a polymorphism contrasting one variant from another. For example, 242A>C would refer to a cytosine instead of an adenosine occurring at position 242 of a particular nucleic acid sequence. When 242A>C is used in respect to a mRNA/cDNA, it can also be used to represent the polymorphism as it occurs in the genomic DNA with the understanding that the position number will likely be different in the genome. Sequence and polymorphic location information for both coding domain sequence and genomic sequence is described herein for the genes relevant to the invention. “Polymorphic variant allele” refers to an allele comprising a particular polymeric variant or a particular set of polymorphic variants corresponding to a particular set of polymorphisms. Two alleles can both be considered the same polymorphic variant allele if they share the same variant or set of variants defined by the polymorphic variant allele even though they may differ in respect to other polymorphisms or variation outside the definition. For a mutation at the amino acid level, the convention “AA1PositionAA2” is used. For example, in the context of amino acid sequence, M726L, would indicate that the underlying, nucleotide level polymorphism(s) has resulted in a change from a methionine to a leucine at position 726 in the amino acid sequence.

A “genotype” can refer to a characterization of an individual's genome in respect to one or both alleles and/or one or more polymorphic variants within that allele. A subject can be characterized at the level that the subject contains a particular allele, or at the level of identifying both members of an allelic pair, the corresponding alleles on the set of two chromosomes. One can also be characterized at the level of having one or more polymorphic variants. The term “haplotype” refers to a cis arrangement of two or more polymorphic variants, on a particular chromosome such as in a particular gene. The haplotype preserves the information of the phase of the polymorphic nucleotides—that is, which set of polymorphic variants were inherited from one parent, and which from the other. Wherein methods, materials, and experiments are described for the invention in respect to polymorphic variants, one will understand that can also be adapted for use with an analogous haplotype.

A single nucleotide polymorphism (SNPs) refers to a variation at a single nucleotide location. In some cases the variations at the position could be any one of the four nucleotide bases, in others the variation is some subset of the four bases. For example, the variation could be between either purine base or either pyrimidine base. Simple-sequence length polymophisms (SSLPs) or short tandem repeat polymorphisms (STRPs) involve the repeat of a particular sequence of one or more nucleotides. A restriction fragment length polymorphism (RFLP) is a variation in the genetic sequence that results in the appearance or disappearance of an enzymatic cleavage site depending on which base(s) are present in a particular allele.

A diagnosis for a given susceptibility in accordance with this invention includes detection of homozygosity and/or heterozygosity for a given polymorphism(s). Heterozygosity and homozygosity are relevant wherein the cell tested has two chromosomal copies. In other contexts, such as in a sperm or egg, only a single chromosome is present so that the issue of homozygosity or heterozygosity does not directly present itself. In the some embodiments, such as those involving cancer, homozygosity or heterozygosity can be lost or at least obscured because of deletion or inactivation of one of the two gene copies.

In those embodiments where a sample is screened to detect the presence or absence of more than one polymorphic variant associated with a given condition, the combination of the polymorphic variants can be additive, synergistic, or even antagonists in regards to correlative strength—although not overly antagonist if the susceptibility or drug effect probability is lost. When screening for multiple polymorphisms all can be heterozygous, all can be homozygous, or a combination with one or more polymorphism homozygous, and one or more polymorphism heterozygous, depending on the particular susceptibility relationship for a given set of polymorphic variants and a condition or drug response.

The polymorphic variants described herein can be associated with an altered susceptibility to one or more complications and/or therapeutic treatments. How a polymorphism is associated with this susceptibility need not be known for the usefulness and operability of the invention. The polymorphism need not actually cause or contribute to etiology or severity of the condition. In some embodiments, the polymorphism can cause or contribute to the condition. In some embodiments, the polymorphism serves as a marker for another polymorphism(s) responsible for causing or contributing to the condition. In such a situation, the polymorphism(s) screened for can be in linkage disequilibrium with the responsible polymorphism(s).

Linkage is the tendency of genes or DNA sequences, for example, polymorphisms, to be inherited together as a consequence of their physical proximity on a single chromosome. The closer together the markers are, the lower the probability that they will be separated during DNA crossing over, and hence the greater the probability that they will be inherited together. If a mutational event introduces a “new” allele in the close proximity of a gene or an allele, the new allele will tend to be inherited together with the alleles present on the “ancestral,” chromosome or haplotype. However, the resulting association, called linkage disequilibrium, will decline over time due to recombination. Linkage disequilibrium has been used to map disease genes. In general, both allele and haplotype frequencies differ among populations. Linkage disequilibrium is varied among the populations, being absent in some and highly significant in others.

Linkage disequilibrium (LD) or allelic association means the preferential association of a particular allele or genetic marker with a specific allele, or genetic marker at a nearby chromosomal location more frequently than expected by chance for any particular allele frequency in the population. For example, if locus P has alleles x and y, which occur with equal frequency, and linked locus Q has alleles w and z, which occur with equal frequency, one would expect the haplotype ac to occur with a frequency of 0.25 in a population of individuals. If xw occurs more frequently, then alleles x and w are considered in linkage disequilibrium. Linkage disequilibrium may result from natural selection of a certain combination of alleles or because an allele has been introduced into a population too recently to have reached equilibrium between linked alleles.

A marker in linkage disequilibrium with disease predisposing variants can be particularly useful in detecting susceptibility to disease or association with sub-clinical phenotypes notwithstanding that the marker does not cause the disease. For example, a marker P that is not itself a causative element of a disease, but which is in linkage disequilibrium with a gene Q that is a causative element of a phenotype, can be used to indicate susceptibility to the disease in circumstances in which the gene Q may not have been identified or may not be readily detectable. Relatively young evolutionarily alleles are expected to have a larger genomic segment in linkage disequilibrium. The age of an allele can be determined from whether the allele is shared among different human ethnic groups and/or between humans and related species.

The polymorphisms described herein can also be used to establish physical linkage between a genetic locus associated with a trait of interest and polymorphic markers that are not associated with the trait, but are in physical proximity with the genetic locus responsible for the trait and co-segregate with the responsible variation. Such analysis is useful for mapping a genetic locus associated with a phenotypic trait to a chromosomal position and thereby cloning gene(s) responsible for the trait [Landau et al., Proc. Natl. Acad. Sci. (USA), 83:7353-7357 (1986); Landau et al., Proc. Natl. Acad. Sci. (USA), 84:2363-2367 (1987); Donis-Keller et al., Cell, 51:319-337 (1987); Landau et al., Genetics, 121:185-199 (1989))]. Genes localized by linkage can be cloned by a process known as directional cloning. [Wainwright, Med. J. Australia, 159:170-174 (1993); Collins, Nature Genetics, 1:3-6 (1992)]. Linkage studies can be performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in an informative meiosis is then analyzed to determine which polymorphic markers co-segregate with a phenotypic trait. [See, e.g., Kerem et al., Science, 245:1073-1080 (1989); Monaco et al., Nature, 316:842 (1985); Yamoka et al., Neurology, 40:222-226 (1990); Rossiter et al., FASEB J., 5:21-27 (1991).]

Linkage is analyzed by calculation of lod (log of the odds) values. A lod value is the relative likelihood of obtaining observed segregation data for a marker and a genetic locus when the two are located at a recombination fraction 0, versus the situation in which the two are not linked, and thus segregating independently [Thompson & Thompson, Genetics in Medicine (5th ed, W. B. Saunders Company, Philadelphia, 1991); Strachan, “Mapping the Human Genome” in The Human Genome (BIOS Scientific Publishers Ltd, Oxford), Chapter 4]. A series of likelihood ratios are calculated at various recombination fractions (O), ranging from θ=0.0 (coincident loci) to θ=0.50 (unlinked). The computed likelihoods are usually expressed as the log₁₀ of this ratio, known as a “lod” score. For example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage being a coincidence. The use of logarithms allows data collected from different families to be combined by simple addition. Computer programs are available for the calculation of lod scores for differing values of 0, for example, LIPED, MLINK [Lathrop, Proc. Nat. Acad. Sci. (USA), 81:3443-3446 (1984)]. For any particular lod score, a recombination fraction may be determined from mathematical tables. [See Smith et al., Mathematical tables for research workers in human genetics (Churchill, London, 1961); Smith, Ann. Hum. Genet., 32:127-150 (1968).] The value of 0 at which the lod score is the highest is considered to be the best estimate of the recombination fraction. Positive lod score values suggest that the two loci are linked, whereas negative values suggest that linkage is less likely (at that value of θ) than the possibility that the two loci are unlinked. By convention, a combined lod score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked. Similarly, by convention, a negative lod score of −2 or less is taken as definitive evidence against linkage of the two loci being compared. Negative linkage data are useful in excluding a chromosome or a segment thereof from consideration. The search focuses on the remaining non-excluded chromosomal locations.

In those embodiments where the screened for polymorphic variant(s) is responsible in part or whole for the condition(s), the polymorphic variant(s) can result in a change in the steady state level of mRNA, for example, through a decrease in transcription and/or mRNA stability. Some polymorphic variants can alter the exon/intron boundary and/or effect how splicing occurs. When the polymorphic variant occurs within or overlaps with the protein-encoding sequence of the gene, the polymorphic variant may be silent resulting in no change at the amino acid level, result in a change of one or more amino acid residues, a deletion of one or more amino acids, addition of one or more amino acids, or some combination of such changes. For some polymorphic variants, the result is premature termination of translation. The effect may be neutral, beneficial, or detrimental, or both beneficial and detrimental, depending on the circumstances. Polymorphic variants occurring in noncoding regions can exert phenotypic effects indirectly via influence on replication, transcription, and translation. Polymorphic variants in DNA can affect the basal transcription or regulated transcription of a gene locus. Such polymorphic variants may be located in any part of the gene but are most likely to be located in the promoter region, the first intron, or in 5′ or 3′ flanking DNA, where enhancer or silencer elements may be located. A single polymorphism can affect more than one phenotypic trait. A single phenotypic trait may be affected by polymorphisms in different genes. Some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.

Determining what effect, if any, a polymorphic variant has on the disease state, condition, or complication with which it is correlated, can be useful in the context of certain aspects of the invention, for example, choosing a proper therapy. Methods for analyzing transcription are well known to those skilled in the art. Transcriptional run off assay is one useful method. Detailed protocols for useful methods can be found in texts such as: Current Protocols in Molecular Biology edited by: F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, K. Struhl, John Wiley & Sons, Inc. (1999), or Molecular Cloning: A Laboratory Manual by J. Sambrook, E. F. Fritsch and T. Maniatis, Cold Spring Harbor Laboratory Press, 2^(nd) edition (1989).

RNA polymorphic variants can affect a wide range of processes including RNA splicing, polyadenylation, capping, export from the nucleus, interaction with translation initiation, elongation or termination factors, or the ribosome, or interaction with cellular factors including regulatory proteins, or factors that may affect mRNA half life. An effect of polymorphic variants on RNA function can ultimately be measurable as an effect on RNA levels—either basal levels or regulated levels or levels in some abnormal cell state. One method for assessing the effect of RNA polymorphic variants on RNA function is to measure the levels of RNA produced by different alleles in one or more conditions of cell or tissue growth. Such measuring can be done by conventional methods such as Northern blots or RNAase protection assays, which can employ kits available from Ambion, Inc., or by methods such as the Taqman assay, or by using arrays of oligonucleotides or arrays of cDNAs or other nucleic acids attached to solid surfaces, such as a multiplex chip. Systems for arraying cDNAs are available commercially from companies such as Nanogen and General Scanning. Complete systems for gene expression analysis are available from companies such as Molecular Dynamics. See also supplement to volume 21 of Nature Genetics entitled “The Chipping Forecast.” Additional methods for analyzing the effect of polymorphic variants on RNA include secondary structure probing, and direct measurement of half life or turnover. Secondary structure can be determined by techniques such as enzymatic probing with use of enzymes such as T1, T2, and S1 nuclease, chemical probing or RNAase H probing using oligonucleotides. Some RNA structural assays can be performed in vitro or on cell extracts.

To determine if one or more polymorphic variants have an effect on protein levels and/or activity, a variety of techniques may be employed. The in vitro protein activity can be determined by transcription or translation in bacteria, yeast, baculovirus, COS cells (transient), CHO, or study directly in human cells. Further, one can perform pulse chase experiments for the determination of changes in protein stability such as half life measurements. One can manipulate the cell assay to address grouping the cells by genotypes or phenotypes. For example, identification of cells with different genotypes and phenotype can be performed using standardized laboratory molecular biological protocols. After identification and grouping, one skilled in the art could determine whether there exists a correlation between cellular genotype and cellular phenotype.

Correlation between one or more polymorphic variants can be performed for a population of individuals who have been tested for the presence or absence of a pregnancy complication or a disease state such as cancer or an intermediate phenotype. Correlation can be performed by standard statistical methods including, but not limited to, chi-squared test, Analyses of polymorphic variant, parametric linkage analysis, non-parametric linkage analysis, etc. and statistically significant correlations between polymorphic form(s) and phenotypic characteristics also can be used.

Genes and Polymorphic Variants

MTHFD1

Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1, methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofo late synthetase (MTHFD1) is a trifunctional enzyme localized to the cytoplasm. MTHFD1 has the further aliases HGNC:7432, MTHFC, and MTHFD. MTHFD1 has the further designations 5,10-methylenetetrahydro fo late dehydrogenase, 5,10-methylenetetrahydrofolate cyclohydrolase, 10-formyltetrahydrofolate synthetase; C1-THF synthase; MTHFC; MTHFD; NADP-dependent cyclohydrolase/formyltetrahydrofolate synthetase; cytoplasmic C-1-tetrahydro folate synthase; methylenetetrahydrofolate dehydrogenase (NADP+ dependent), methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthetase; methylenetetrahydrofolate dehydrogenase 1. MTHFD1 has been assigned Gene ID 4522, and is positioned on chromosome 14 at locus 14q24. Further information for MTHFD1 is found on the NCBI website in the Entrez Gene database and Online Mendelian Inheritance in Man (OMIM) website under entry +172460.

MTHFD1 nucleic acid and amino acid sequences relevant to the invention include genomic, cDNA, and fragments thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of MTHFD1 sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for MTHFD1. NM005956 includes coding nucleic acid sequence of MTHFD1 (SEQ ID NOS: 1 and 2, with SEQ ID NO: 2 providing the nucleic acid sequence of the coding region) and also provides the amino acid sequence of MTHFD1, which is the translation of the coding region (SEQ ID NO: 3). Other relevant sequence information includes J04031; NP005947; BC001014, AAH01014; BC009806; AAH09806; BC050420; AAH50420; J04031; AAA59574; P11586. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. An example of such a fragment is provided in SEQ ID NO: 4. The genomic sequence is provided in SEQ ID NO: 5 and corresponds to positions 63924886 and 63996474 inclusive in NC_(—)000014. Screening with a genomic fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. An example of such a fragment is also provided in SEQ ID NO: 4. SEQ ID NOS: 1-5 indicate the variability corresponding to the MTHFD1 1958G>A polymorphism (at the nucleotide level: position 2011 in SEQ ID NO: 1; position 1958 in SEQ ID NO: 2; position 15 in SEQ ID NO: 4; 63978638 in the NC_(—)000014 genomic sequence corresponding to position 53753 in SEQ ID NO: 5) and the Arg653Gln polymorphism in the amino acid sequence (SEQ ID NOS: 1 and 3). This polymorphism is given the designation rs2236225 in the SNP database of NCBI's Entrez. Allele frequencies for the MTHFD1 1958G>A polymorphic variant are as follows:

Geographical/Ethnic Populations A Allele Frequency Ireland 0.45 The Netherlands 0.45 Germany 0.40 Italy 0.45 Turkey 0.45 Africa 0.16 Israel 0.47 Pakistan 0.50 Northern China 0.24 Mexico 0.61 Brazil 0.79 [Brody et al., Am. J. Hum. Genet., 71: 1207-1215 (2002); Hol et al., Clin. Genet. 53: 119-125 (1998); Akar & Akar, Acta Haematol., 102: 199-200 (1999); Konrad et al., J. Neurol., 251: 1242-1248 (2004); Cheng et al., Biomed. Environ. Sci., 18: 58-64 (2005); Shi et al., Birth Defects Res. Part A, 67: 545-549 (2003); DeMarco et al., 48th Annual Meeting of the Society for Research into Hydrocephalus and Spina Bifida, Dublin 23-26 Jun. 2004.]

MTHFD1L

Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like (MTHFD1L) is a trifunctional enzyme localized to mitochondria with enzymtatic activity similar to MTHFD1, sharing at least one enzymatic activity with that enzyme. MTHFD1L has the further aliases HGNC:21055, DKFZp586G1517, FLJ21145, FTHFSDC1, dJ292B18.2, and further designations RP1-292B18.2; formyltetrahydrofolate synthetase domain containing 1; mitochondrial C1-tetrahydrofolate synthase; mitochondrial C1-tetrahydrofolate synthetase. MTHFD1L has been assigned Gene ID 25902, and is positioned on chromosome 6 at locus 6q25.1. Further information for MTHFD1L is found on the NCBI website in the Entrez Gene database.

MTHFD1L nucleic acid and amino acid sequences relevant to the invention include genomic, cDNA, and fragments thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of MTHFD1L sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for MTHFD1 L. NM0015440 includes coding nucleic acid sequence of MTHFD1L (SEQ ID NOS: 6 and 7, with SEQ ID NO: 7 providing the nucleic acid sequence of the coding region) and also provides the translation of the coding region (SEQ ID NO: 8). These sequences correspond to a 3.6 kb transcript. AY374131 includes coding nucleic acid sequence of MTHFD1L (SEQ ID NOS: 9 and 10, with SEQ ID NO: 10 providing the nucleic acid sequence of the coding region) and amino acid sequence (SEQ ID NO: 11) for a 1.1 kb transcript of MTHFD1L. Other relevant sequence information includes NP056255; AA478842; AL117452; AV704883; BE735249; BQ062382; AL035086; CAI42788; CAI42793; CAI42794; CAI42795; AL133260; CAC03667; AA478842; AB127387; BAD93193; AK024798; BAB15009; AK127089; AL117452; CAB55934; AV704883; AY374130; AAQ82696; AAQ82697; BC008629; AAH08629; BC017477; AAH17477; BE735249; BQ062382. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. The genomic sequence is provided in SEQ ID NO: 12 and corresponds to positions 151278805 and 151515137 inclusive in NC_(—)000006. Screening with a genomic fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. An example of such a fragment is provided in SEQ ID NO: 13. SEQ ID NOS: 12 and 13 indicate the variability corresponding to the “ATT” short tandem repeat polymorphism at starting at position 151312078 in the source genomic sequence (position 33274 of SEQ ID NO: 12 and position 5 of SEQ ID NO: 13). This polymorphism is given the designation rs3832406 in the SNP database of NCBI's Entrez, and also corresponds to position 55374834 in NT_(—)025741. As this polymorphism is located in an intron, the polymorphism is not designated in SEQ ID NOS: 6-11. In some embodiments, the relevant polymorphism has an effect on splicing, and accordingly an effect on the transcription and amino acid sequence encoded by the same.

Allele frequencies for the MTHFD1L rs3832406 “ATT” Intron 7 tandem repeat are as follows with Allele 1 comprising “ATT” repeated seven times, Allele 2 comprising “ATT” repeated eight times, and Allele 3 comprising “ATT” repeated nine times.

Geographical/Ethnic Population Allele 1 Allele 2 Allele 3 Ireland 0.64 0.21 0.15

The MTHFD1L gene produces two mRNA transcripts. The shorter one originates from the use of an alternative exon 8A that may be derived from an Alu element. Although not wishing to be bound by any particular theory, it appears that these alleles affect how efficiently alternative exon 8A is used. Any putative effect is relevant to folate metabolism since alternative exon 8A produces a premature stop codon that translates into a protein product that lacks a synthetase domain.

Other Diagnostic Genes and Polymorphic Variants

Polymorphic variants to be screened for are principally located in or in close proximity to the MTHFD1 and/or MTHFD1L genes. Representative, polymorphic variants that can be tested for in addition to MTHFD1 and/or MTHFD1 variant(s), include those associated with following described genes without limitation to polymorphism or gene. In some embodiments, the screened for polymorphic variants are correlated with the same disease. In some embodiments, the screened for polymorphic variants are correlated with different diseases.

MTHFR

5,10-methylenetetrahydrofolate reductase (NADPH) (MTHFR) is an enzyme involved in one-carbon metabolic pathways such as folate-dependent one-carbon pathways. MTHFD1 has the further alias HGNC:7436. MTHFR has the further designations methylenetetrahydrofolate reductase; methylenetetrahydrofolate reductase intermediate form. MTHFR has been assigned Gene ID 4524, and is positioned on chromosome 1 at locus 106.3. Further information for MTHFR is found on the NCBI website in the Entrez Gene database and Online Mendelian Inheritance in Man (OMIM) website under entry *607093. Polymorphic variants that can be screened for in addition to one or more of the MTHFD1 and MTHFD1L polymorphic variants relevant to the invention include the polymorphic variant described in the OMIM MTHFR entry *607093 as allelic variant 0.0003 MTHFR 677C>T, Ala222Val. Frosst et al., Mammalian Genome, 7:864-869 (1995), reported the 677C>T mutation in the MTHFR gene, resulting in an Ala222Val substitution. Polymorphic variants that can be screened for in addition to one or more of the MTHFD1 and MTHFD1L polymorphic variants relevant to the invention include the polymorphic variant described in the OMIM MTHFR entry *607093 as allelic variant 0.0004 MTHFR 1298A>C, Glu429. Van der Put et al., Am. J. Hum. Genet., 62:1044-1051 (1998), identified another polymorphism of the MTHFR gene: a 1298A>C mutation resulting in a Glu429Ala substitution.

MTHFR nucleic acid and amino acid sequences relevant to the invention include both genomic, cDNA, and fragments thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of MTHFR sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for MTHFR. NM005957 includes coding nucleic acid sequence of MTHFR and also provides the translation of the coding region. Other relevant sequence information includes AF105977; AAD17965; AF105978; AAD17965; AF105979; AAD17965; AF105980; AAD17965; AF105981; AAD17965; AF105982; AAD17965; AF105983; AAD17965; AF105984; AAD17965; AF105985; AAD17965; AF105986; AAD17965; AF105987; AAD17965; AF398930; AAN40863; AAN40864; AAN40865; AJ249275; CAB81551; CAB81552; AL953897; CAI15885; CAI15886; CAI15887; CAI15888; CAI15889; AY338232; AAP88033; AB209113; BAD92350; AJ237672; CAB41971; AY046560; AAL17646; AY046561; AAL17647; AY046562; AAL17648; AY046563; AAL17649; AY046564; AAL17650; AY046565; AAL17651; BC011614; BC018766; AAH18766; BC053509; AAH53509; P42898. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. The genomic sequence corresponds to positions 11780945 and 11800248 inclusive in NC_(—)000001. Screening with a genomic fragments of at least 30 nucleic acids are within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. The variability corresponding to the 677C>T polymorphism occurs at position 11790510 in the genomic sequence. The variability corresponding to the 1298A>C polymorphism occurs at position 11792412 in the genomic sequence.

Allelic frequencies for MTHFR 677C>T are as follows:

Geographical/Ethnic T Allele Geographical/Ethnic T Allele Populations Frequency Populations Frequency Ireland 0.29 Ashkenazi Jewish 0.48 Spain 0.34 Northern Han Chinese 0.44 France 0.36 Southern Han Chinese 0.34 Germany 0.29 Australian white 0.29 The Netherlands 0.27 Mexico 0.57 Russia 0.27 African Americans 0.12 Italy 0.41 US Caucasians 0.32 Southern Italy 0.46 US Hispanics 0.45 Israel 0.26 US Asian 0.21 Canadian White 0.25 [Wilcken et al., J. Med. Genet., 40: 619-625 (2003); Rady et al., Am. J. Med. Genet., 107: 162-168 (2002); Kirke et al., BMJ, 328: 1535-1536 (2004); Konrad et al., J Neurol., 251: 1242-1248 (2004).]

Factor II

Coagulation factor II (F2) is a factor that is cleaved from prothrombin to thrombin in the blood clotting cascade. F2 has the further aliases HGNC:3535 and PT. F2 has the further designations prothrombin; prothrombin B-chain; serine protease. F2 has been assigned Gene ID 2147, and is positioned on chromosome 11 at locus 11p11-q12. Further information for F2 is found on the NCBI website in the Entrez Gene database and Online Mendelian Inheritance in Man (OMIM) website under entry +176930. Polymorphic variants that can be screened for in addition to one or more of the MTHFD1 and MTHFD1L polymorphic variants relevant to the invention include the polymorphic variant described in the OMIM Factor V Deficiency +176930 entry as allelic variant 0.0009; 20210G>A. Poort et al., Blood, 88:3698-3703 (1996), described this common genetic variation in the 3-prime untranslated region of the gene that is associated with elevated plasma prothrombin levels and an increased risk of venous thrombosis: a G-to-A transition at position 20210, see Degen and Davie, Biochemistry, 26:6165-6177 (1987).

F2 nucleic acid and amino acid sequences relevant to the invention include both genomic, cDNA, and fragments thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of F2 sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for F2. NM000506 includes coding nucleic acid sequence of F2 and also provides the translation of the coding region. Other relevant sequence information includes M17262,V00595, AF478696; AAL77436; AF493953; AAM11680; AJ544114; CAD80258; M17262; AAC63054; 550162; AAB24476; AY344793; AAR08142; AY344794; AAR08143; BCO₅₁₃₃₂; AAH51332; M33031; AAA60220; V00595; CAA23842; P00734. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. The genomic sequence corresponds to positions 46697331 and 46717631 inclusive in NC_(—)000011. Screening with a genomic fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. This polymorphism is provided in the SNP database of NCBI's Entrez.

Factor V

Coagulation factor V (proaccelerin, labile factor) (F5) is a factor in the blood clotting cascade. F5 has the further aliases HGNC:3542, FVL, PCCF, factor V. F5 has the further designations activated protein c cofactor; coagulation factor V; coagulation factor V jinjiang A2 domain; factor V Leiden; labile factor. F5 has been assigned Gene ID 2153, and is positioned on chromosome 1 at locus 1q23. Further information for F5 is found on the NCBI website in the Entrez Gene database and Online Mendelian Inheritance in Man (OMIM) website under entry +227400. Polymorphic variants that can be screened for in addition to one or more of the MTHFD1 and MTHFD1 L polymorphic variants relevant to the invention include the polymorphic variant described in the OMIM Factor V Deficiency +227400 entry as allelic variant 0.0001, Arg506Gln, 1691G>A, “Factor V Leiden.” The Factor V Leiden polymorphic variant was reported by Bertina et al., Nature, 369:64-67 (1994).

F5 nucleic acid and amino acid sequences relevant to the invention include both genomic, cDNA, and fragments thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of F5 sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for F5. NM000130 includes coding nucleic acid sequence for F5 and also provides the translation of the coding region. Other relevant sequence information includes AH005274, M14335, AF119360; AAF32515; AF285083; AAG30113; AY046060; AAL09164; AY136818; AAN12307; AY364535; AAQ55063; L32755; AAB59401; L32779; AAB59401; Z99572; CAB16748; CAI23065; AJ297254; CAC82572; AJ297255; CAC82573; M14335; AAB59532; M16967; AAA52424; M94010; AAA52416; P12259. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. The genomic sequence corresponds to positions 166287379 and 166215067 inclusive in NC_(—)000001. Screening with a genomic fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome.

TCNII

Transcobalamin II (TCNII) is a Vitamin B₁₂ binding protein. TCNII has the further aliases HGNC:11653, D22S676, D22S750, and TC2. TCNII has been assigned Gene ID 6948, and is positioned on chromosome 22 at locus 22q12.2. Further information for TCNII is found on the NCBI website in the Entrez Gene database and Online Mendelian Inheritance in Man (OMIM) website under entry +275350.

TCNII nucleic acid and amino acid sequences relevant to the invention include both genomic, cDNA, fragments, and products thereof. The particular sequences identified herein by sequence identification number and/or accession number are representative of TCNII sequences. One of skill in the art can appreciate that there can be variability in the gene or gene fragment distinct from the polymorphism(s) of interest and that such allelic variants still fall within the scope of the invention. As the polymorphism will be reflected in both strands of the DNA, the screening in the context of the invention can involve one or both of the strand sequences. Accordingly, where the sequence for a given strand is provided, the invention also includes the use of its complement.

The following are representative sequences for TCNII. NM000355 includes coding nucleic acid sequence for TCNII (SEQ ID NOS: 14 and 15, with SEQ ID NO: 15 providing the nucleic acid sequence of the coding region) and also provides the translation of the coding region (SEQ ID NO: 16). Other relevant sequence information includes AF047576; AAC05491; AF076647; AAG24506; BC001176; AAH01176; BC011239; AAH11239; CR456591; CAG30477; L02647; AAA61056; L02648; AAA61057; M60396; AAA61054; P20062; AAB25526. Screening with a fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. An example of such a fragment is provided in SEQ ID NO: 17. The genomic sequence is provided in SEQ ID NO: 18 and corresponds to positions 29327715 and 29347601 inclusive in NC_(—)000022. Screening with a genomic fragment of at least 30 nucleic acids is within the scope of the invention, however, smaller fragments are also possible provided that they comprise the relevant polymorphism(s) and provide a sequence unique in the human genome. An example of such a fragment is also provided in SEQ ID NO: 17. SEQ ID NOS: 14-18 indicate the variability corresponding to the 776C>G polymorphism (at the nucleotide level: position 934 in SEQ ID NO: 14; position 776 in SEQ ID NO: 15; position 16 in SEQ ID NO: 17; position 8450 in SEQ ID NO: 18—position 29336164 in the source genomic sequence) and the Pro259Arg polymorphism in the amino acid sequence (SEQ ID NOS: 14 and 16). This polymorphism is given the designation rs1801198 in the SNP database of NCBI's Entrez.

The invention also includes use of other polymorphic variants of the genes and proteins described herein. Use of both the nucleic acids described herein and their complements are within the scope of the invention. In connection with the provision and description of nucleic acid sequences, the references herein to gene names and to GenBank and OMIM reference numbers provide the relevant sequences, recognizing that the described sequences will, in most cases, also have other corresponding allelic variants. Although the referenced sequences may contain sequencing error, such error does not interfere with identification of a relevant gene or portion of a gene, and can be readily corrected by redundant sequencing of the relevant sequence (preferably using both strands of DNA). Nucleic acid molecules or sequences can be readily obtained or determined utilizing the reference sequences. Molecules such as nucleic acid hybridization probes and amplification primers can be provided and are described by the selected portion of the reference sequence with correction if appropriate. In some embodiments, probes comprise 5, 6, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 25, 27, 30, 35, 40, 45, 50, or more nucleotides.

Diagnosis

The terms “disease” or “condition” are commonly recognized in the art and designate the presence of signs and/or symptoms in an individual or patient that are generally recognized as abnormal. Unless indicated as otherwise, the terms “disease,” “disease state,” condition,” “disorder,” and “complication” can be used interchangeably. Diseases or conditions can be diagnosed and categorized based on pathological changes. Signs can include any objective evidence of a disease such as changes that are evident by physical examination of a patient or the results of diagnostic tests which may include, among others, laboratory tests to determine the presence of polymorphic variants or variant forms of certain genes in a patient. Symptoms can include a patient's perception of an abnormal condition that differs from normal function, sensation, or appearance, which may include, for example, physical disabilities, morbidity, pain, and other changes from the normal condition experienced by an individual. Various diseases or conditions include, but are not limited to, those categorized in medical texts.

Unless otherwise indicated, the term “suffering from a disease or condition” can refer to a person that currently has signs and symptoms, or is more likely to develop such signs and symptoms than a normal person in the population. For example, a person suffering from a condition can include a developing fetus, a person subject to a treatment or environmental condition that enhances the likelihood of developing the signs or symptoms of a condition, or a person who is being given or will be given a treatment that increases the likelihood of the person developing a particular condition. Methods of the invention relating to treatments of patients can include primary treatments directed to a presently active disease or condition, secondary treatments that are intended to cause a biological effect relevant to a primary treatment, and prophylactic treatments intended to delay, reduce, or prevent the development of a disease or condition, as well as treatments intended to cause the development of a condition different from that which would have been likely to develop in the absence of the treatment.

Combined detection of several such polymorphic variants typically increases the probability of an accurate diagnosis. Analysis of the polymorphisms of the invention can be combined with that of other polymorphisms or other risk factors such as family history. Polymorphisms can be used to diagnose a disease at the pre-symptomatic stage, as a method of post-symptomatic diagnosis, as a method of confirmation of diagnosis or as a post-mortem diagnosis. Ethical issues to be considered in screening and diagnosis are discussed generally in Reich, et al., Genet. Med., 5:133-143 (2003).

Pregnancy-Related Complications

Pregnancy-related complications include not just complications that occur during the course of pregnancy, but also include infertility complications. That is, a pregnancy-related complication can also, or in the alternative, involve a complication that prevents pregnancy from occurring or diminishes the probability that pregnancy will occur. Accordingly, the polymorphic variants relevant to the invention can be correlated with infertility. Particular pregnancy-related complications are described as follows without limitation to other relevant pregnancy-related complications correlating with one or more polymorphic variants relevant to the invention. Screening for polymorphic variants in the context of pregnancy-related complications can include screening of the mother as well as the father and the unborn child(ren). Both males and females can be screened using the methods of the invention. A female screened may be of any age, born or unborn, and need not be pregnant when screened. In some embodiments, the subject screened is female and has had complications becoming pregnant, which can include any number of different infertility factors. In some embodiments, the woman screened has been pregnancy previously but has suffered complications during pregnancy. In some embodiments, the woman screened is pregnant, but is not carrying an embryo or fetus with a neural tube defect. The sample screened from any subject may be derived from any number of different sources such as cells, tissues, and organs. In some embodiments, the sample comprises blood. In some embodiments, the sample comprises an egg and/or sperm. In some embodiments, the sample screened comprises a somatic cell.

Placental Abruption

The diagnosis of abruptio placentae or placental abruption can be based on hemorrhage and accumulation of blood between the placenta and the wall of the uterus. In some embodiments, diagnosis is based on a sudden rupture of the spiral arteries, resulting in the premature separation of a normally implanted placenta. Severe placental abruption is generally characterized by more extensive manifestations of placental abruption and can also comprise worse clinical outcomes such as death of the mother and or children. In some embodiments, severe placental abruption is diagnosis based on a retroplacental clot and/or accidental haemorrhage with associated clinical signs of abruption and/or a statement in the case records that the patient was a definite case of abruptio placentae. Data on gestational age at delivery, maternal hypertension, maternal blood transfusion, and pregnancy outcome can be collected. Control pregnancies can be selected from women with no history of abruptio placentae, and can be matched for the same date and clinic as the cases where the genetically tested blood sample was provided.

Diagnosis for an increased susceptibility for severe placental abruption is rendered when a particular polymorphic variant that has been correlated with severe placental abruption is identified. In some embodiments, the polymorphic variant is an adenosine at position 1958 of MTHFD1. In some embodiments, the subject tested is homozygous for the 1958A variant, in some embodiments, the subject is heterozygous for the 1958A variant.

Miscarriage

Miscarriage is the loss of one or more children before birth. In some embodiments, the miscarriage occurs in the second trimester. In other embodiments, the miscarriage occurs in the first or third trimester. In some embodiments, the miscarriage has no clinical explanation. A miscarriage can comprise a spontaneous abortion and/or fetal death.

Diagnosis for an increased susceptibility for miscarriage is rendered when a particular polymorphic variant that has been correlated with miscarriage is identified. This correlation can be with first, second, and/or third trimester miscarriage. In some embodiments, the correlation is with unexplained second trimester miscarriage. In some embodiments, the polymorphic variant is an adenosine at position 1958 of MTHFD1. In some embodiments, the subject tested is homozygous for the 1958A variant, in some embodiments, the subject is heterozygous for the 1958A variant.

Neural Tube Defects

Neural tube defects include, for example, anencephaly, encephalocele, iniencephaly, and spina bifida, and are diagnosed by symptoms commonly accepted in medical field. Diagnosis for an increased susceptibility for a neural tube defect is rendered when a particular polymorphic variant that has been correlated with a neural tube defect is identified. In some embodiments, the increased susceptibility for a neural tube defect is rendered when a 7-repeat variant of the MTHFD1L ATT polymorphism rs3832406 (position 55374834 in NT 025741) is identified. In some embodiments, the subject tested is homozygous for the 7-repeat ATT polymorphism, in some embodiments the subject is heterozygous for the 7-repeat ATT polymorphism. In some embodiments, one or two copies of a 8-repeat repeat variant of the MTHFD1L polymorphism rs3832406, wherein the 8-repeat variant is correlated with a protective effect, that is a decreased susceptibility for a NTD. In some embodiments, diagnosis is based not only on a polymorphic variant in the MTHFD1L gene, but also with the MTHFD1 1958A variant. In some embodiments, the subject tested is homozygous for the 1958A variant, in some embodiments, the subject is heterozygous for the 1958A variant.

Neoplastic Diseases

Diagnosis for an increased susceptibility for a drug dosage complication can be rendered based on a polymorphic variant in MTHFD1L that has been correlated with such a complication. In some embodiments, the increased susceptibility for a neural tube defect is rendered when a 7-repeat variant of the MTHFD1L ATT polymorphism rs3832406 (position 55374834 in NT_(—)025741) is identified. In some embodiments, the subject tested is homozygous for the 7-repeat ATT polymorphism, in some embodiments the subject is heterozygous for the 7-repeat ATT polymorphism.

As used herein, the term “cancer” is meant any malignant growth or tumor caused by abnormal and uncontrolled cell division that may spread to other parts of the body through the lymphatic system or the blood stream. The cancer can be, for example, breast cancer, prostate cancer, lung cancer, colon cancer, rectal cancer, urinary bladder cancer, non-Hodgkin lymphoma, melanoma, renal cancer, pancreatic cancer, cancer of the oral cavity, pharynx cancer, ovarian cancer, thyroid cancer, stomach cancer, brain cancer, multiple myeloma, esophageal cancer, liver cancer, cervical cancer, larynx cancer, cancer of the intrahepatic bile duct, acute myeloid leukemia, soft tissue cancer, small intestine cancer, testicular cancer, chronic lymphocytic leukemia, Hodgkin lymphoma, chronic myeloid cancer, acute lymphocytic cancer, cancer of the anus, anal canal, or anorectum, cancer of the vulva or cancer of the neck, gallbladder, pleura, malignant mesothelioma, bone cancer, cancer of the joints, hypopharynx cancer, cancer of the eye, cancer of the nose, nasal cavity, neck, or middle ear, nasopharynx cancer, ureter cancer, peritoneum, omentum, or mesentery cancer, or gastrointestinal carcinoid tumor.

Those skilled in the art will understand whether the polymorphic variants or gene forms in normal or disease cells are most indicative of the expected treatment response, and will generally utilize a diagnostic test with respect to the appropriate cells. Such a cell type indication or suggestion can be contained in a regulatory statement, for example, on a label or in a product insert.

Alzheimer's Disease

Intermediates and defects in one carbon metabolic pathways have been shown to play a role in Alzheimer's disease. Accordingly, the invention includes methods of and materials for predicting altered susceptibility to Alzheimer's disease and response to Alzheimer's disease therapeutic agents based on correlation with the polymorphic variants discussed herein. The invention is also relevant to other central nervous system (CNS) diseases and therapeutic agents.

Cardiovascular Disease

Intermediates and defects in one carbon metabolic pathways have been shown to play a role in some cardiovascular diseases. Accordingly, the invention includes methods of and materials for predicting altered susceptibility to cardiovascular disease and response to cardiovascular disease therapeutic agents based on correlation with the polymorphic variants discussed herein.

Detection Probes

The detection of the presence or absence of a polymorphic variant can involve contacting a nucleic acid sequence corresponding to one of the genes identified above or a product of such a gene with a probe. The probe is able to distinguish a particular form of the gene, gene product, polymorphic variant allele product, or allele product, or the presence or a particular polymorphic variant or polymorphic variants, for example, by differential binding or hybridization. The term “probe” refers to a molecule that can detectably distinguish between target molecules differing in structure. Detection can be accomplished in a variety of different ways depending on the type of probe used and the type of target molecule. Thus, for example, detection may be based on discrimination of activity levels of the target molecule, but preferably is based on detection of specific binding. Examples of such specific binding include antibody binding and nucleic acid probe hybridization. Probes can comprise one or more of the following, a protein, carbohydrate, polymer, or small molecule, that is capable of binding to one polymorphic variant or variant form of the gene or gene product to a greater extent than to a form of the gene having a different base at one or more polymorphic variant sites, such that the presence of the polymorphic variant or variant form of the gene can be determined. A probe can incorporate one or more markers including, but not limited to, radioactive labels, such as radionuclides, fluorophores or fluorochromes, peptides, enzymes, antigens, antibodies, vitamins or steroids. A probe can distinguish at least one of the polymeric variant described herein. The probe can also have specificity for the particular gene or gene product, at least to an extent such that binding to other genes or gene products does not prevent use of the assay to identify the presence or absence of the particular polymorphic variant or polymorphic variants of interest.

Nucleic Acids

The nucleic acid molecules relevant to the invention can readily be obtained in a variety of ways, including, without limitation, chemical synthesis, cDNA or genomic library screening, expression library screening, and/or PCR amplification of cDNA. These methods and others useful for isolating such DNA are set forth, for example, by Sambrook, et al., “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989); by Ausubel, et al., eds., “Current Protocols In Molecular Biology,” Current Protocols Press (1994), and by Berger and Kimmel, “Methods In Enzymology: Guide To Molecular Cloning Techniques,” vol. 152, Academic Press, Inc., San Diego, Calif. (1987). Nucleic acid sequences are mammalian sequences. In some embodiments, the nucleic acid sequences are human, rat, and mouse.

Chemical synthesis of a nucleic acid molecule can be accomplished using methods well known in the art, such as those set forth by Engels et al., Angew. Chem. Intl. Ed., 28:716-734 (1989). These methods include, inter alia, the phosphotriester, phosphoramidite and H-phosphonate methods of nucleic acid synthesis. Nucleic acids larger than about 100 nucleotides in length can be synthesized as several fragments, each fragment being up to about 100 nucleotides in length. The fragments can then be ligated together to form a full length nucleic acid encoding the polypeptide. A preferred method is polymer-supported synthesis using standard phosphoramidite chemistry.

Alternatively, the nucleic acid may be obtained by screening an appropriate cDNA library prepared from one or more tissue source(s) that express the polypeptide, or a genomic library from any subspecies. The source of the genomic library may be any tissue or tissues from any mammalian or other species believed to harbor a gene encoding a protein relevant to the invention. The library can be screened for the presence of a cDNA/gene using one or more nucleic acid probes (oligonucleotides, cDNA or genomic DNA fragments that possess an acceptable level of homology to the gene or gene homologue cDNA or gene to be cloned) that will hybridize selectively with the gene or gene homologue cDNA(s) or gene(s) that is(are) present in the library. The probes preferably are complementary to or encode a small region of the DNA sequence from the same or a similar species as the species from which the library was prepared. Alternatively, the probes may be degenerate, as discussed below. After hybridization, the blot containing the library is washed at a suitable stringency, depending on several factors such as probe size, expected homology of probe to clone, type of library being screened, number of clones being screened, and the like. Stringent washing solutions are usually low in ionic strength and are used at relatively high temperatures.

Another suitable method for obtaining a nucleic acid in accordance with the invention is the polymerase chain reaction (PCR). In this method, poly(A)+RNA or total RNA is extracted from a tissue that expresses the gene product. cDNA is then prepared from the RNA using the enzyme reverse transcriptase. Two primers typically complementary to two separate regions of the cDNA (oligonucleotides) are then added to the cDNA along with a polymerase such as Taq polymerase, and the polymerase amplifies the cDNA region between the two primers.

The invention provides for the use of isolated, purified or enriched nucleic acid sequences of 15 to 500 nucleotides in length, 15 to 100 nucleotides in length, 15 to 50 nucleotides in length, and 15 to 30 nucleotides in length, which have sequence that corresponds to a portion of one of the genes identified for aspects above. In some embodiments the nucleic acid is at least 17, 20, 22, or 25 nucleotides in length. In some embodiments, the nucleic acid sequence is 30 to 300 nucleotides in length, or 45 to 200 nucleotides in length, or 45 to 100 nucleotides in length. In some embodiments, the probe is a nucleic acid probe at least 15, 17, 20, 22, 25, 30, 35, 40, or more nucleotides in length, or 500, 250, 200, 100, 50, 40, 30 or fewer nucleotides in length. In preferred embodiments, the probe has a length in a range from any one of the above lengths to any other of the above lengths including endpoints. The nucleic acid sequence includes at least one polymorphic variant site. Such sequences can, for example, be amplification products of a sequence that spans or includes a polymorphic variant site in a gene identified herein. A nucleic acid with such a sequence can be utilized as a primer or amplification oligonucleotide that is able to bind to or extend through a polymorphic variant site in such a gene. Another example is a nucleic acid hybridization probe comprised of such a sequence. In such probes, primers, and amplification products, the nucleotide sequence can contain a sequence or site corresponding to a polymorphic variant site or sites, for example, a polymorphic variant site identified herein. The design and use of allele-specific probes for analyzing polymorphisms is known generally in the art, see, for example, Saiki et al., Nature, 324:163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment of target DNA from one individual but do not hybridize to the corresponding segment from another individual due to the presence of different polymorphic forms in the respective segments from the two individuals. A nucleic acid hybridization probe may span two or more polymorphic variant sites. Unless otherwise specified, a nucleic acid probe can include one or more nucleic acid analogs, labels or other substituents or moieties so long as the base-pairing function is retained. The nucleic acid sequence includes at least one polymorphic variant site. The probe may also comprise a detectable label, such as a radioactive or fluorescent label. A variety of other detectable labels are known to those skilled in the art. Nucleic acid probe can also include one or more nucleic acid analogs.

In connection with nucleic acid probe hybridization, the term “specifically hybridizes” indicates that the probe hybridizes to a sufficiently greater degree to the target sequence than to a sequence having a mismatched base at least one polymorphic variant site to allow distinguishing of such hybridization. The term “specifically hybridizes” means that the probe hybridizes to the target sequence, and not to non-target sequences, at a level which allows ready identification of probe/target sequence hybridization under selective hybridization conditions. “Selective hybridization conditions” refer to conditions that allow such differential binding. Similarly, the terms “specifically binds” and “selective binding conditions” refer to such differential binding of any type of probe, and to the conditions that allow such differential binding. Hybridization reactions to determine the status of variant sites in patient samples can be carried out with two different probes, one specific for each of the possible variant nucleotides. The complementary information derived from the two separate hybridization reactions is useful in corroborating the results.

A variety of variables can be adjusted to optimize the discrimination between two variant forms of a gene, including changes in salt concentration, temperature, pH and addition of various compounds that affect the differential affinity of GC vs. AT base pairs, such as tetramethyl ammonium chloride. [See, Current Protocols in Molecular Biology, Ausubel et al. (Editors), John Wiley & Sons.] Hybridization conditions should be sufficiently stringent such that there is a significant difference in hybridization intensity between alleles, and preferably an essentially binary response, whereby a probe hybridizes to only one of the alleles. Hybridizations are usually performed under stringent conditions that allow for specific binding between an oligonucleotide and a target nucleic acid containing one of the polymorphic sites described herein or identified using the techniques described herein. Stringent conditions are defined as any suitable buffer concentrations and temperatures that allow specific hybridization of the oligonucleotide to highly homologous sequences spanning at least one polymorphic site and any washing conditions that remove non-specific binding of the oligonucleotide. For example, conditions of 5×SSPE (750 mM NaCl, 50 mM Na Phosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C. are suitable for allele-specific probe hybridizations. The washing conditions usually range from room temperature to 60° C. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position of the probe. This probe design achieves good discrimination in hybridization between different allelic forms.

Allele-specific probes are can be used in pairs, one member of a pair showing a perfect match to a reference form of a target sequence and the other member showing a perfect match to a variant form. Several pairs of probes can then be immobilized on the same support for simultaneous analysis of multiple polymorphisms within the same target sequence. The polymorphisms can also be identified by hybridization to nucleic acid arrays, some examples of which are described by WO 95/11995. Arrays may be provided in the form of a multiplex chip.

One use of probe(s) is as a primer(s) that hybridizes to a nucleic acid sequence containing at least one sequence polymorphic variant. Preferably such primers hybridize to a sequence not more than 300 nucleotides, more preferably not more than 200 nucleotides, still more preferably not more than 100 nucleotides, and most preferably not more than 50 nucleotides away from a polymorphic variant site which is to be analyzed. Preferably, a primer is 100 nucleotides or fewer in length, more preferably 50 nucleotides or fewer, still more preferable 30 nucleotides or fewer, and most preferably 20 or fewer nucleotides in length. In some embodiments, the set includes primers or amplification oligonucleotides adapted to bind to or extend through a plurality of sequence polymorphic variants in a gene(s) identified herein. In some embodiments, the plurality of polymorphic variants comprises a haplotype. In certain embodiments, the oligonucleotides are designed and selected to provide polymorphic variant-specific amplification.

Proteins and Expression of Nucleic Acids

Polymorphic variant alleles or fragments thereof can be expressed in an expression vector in which a variant gene is operably linked to a native or other promoter. Usually, the promoter is a eukaryotic promoter for expression in a mammalian cell. The transcription regulation sequences typically include a heterologous promoter and optionally an enhancer that is recognized by the host. The selection of an appropriate promoter, for example trp, lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host selected. Commercially available expression vectors can be used. Vectors can include host-recognized replication systems, amplifiable genes, selectable markers, host sequences useful for insertion into the host genome, and the like.

The means of introducing the expression construct into a host cell varies depending upon the particular construction and the target host. Suitable means include fusion, conjugation, transfection, transduction, electroporation or injection, as described in Sambrook, supra. A wide variety of host cells can be employed for expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells include bacteria such as E. coli, yeast, filamentous fungi, insect cells, mammalian cells, typically immortalized, e.g., mouse, CHO, human and monkey cell lines and derivatives thereof. Host cells can be selected to process the variant gene product to produce an appropriate mature polypeptide. Processing includes glycosylation, ubiquitination, disulfide bond formation, and general post-translational modification.

The protein can be isolated by conventional means of protein biochemistry and purification to obtain a substantially pure product, i.e., 80, 95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein Purification, Principles and Practice, 2nd Edition, Springer-Verlag, New York (1987); and Deutscher (ed), Guide to Protein Purification, Methods in Enzymology, Vol. 182 (1990). If the protein is secreted, it can be isolated from the supernatant in which the host cell is grown. If not secreted, the protein can be isolated from a lysate of the host cells.

In addition to substantially full-length polypeptides expressed by variant genes, the invention includes use of biologically active fragments of the polypeptides, or analogs thereof, including organic molecules that simulate the interactions of the peptides. Biologically active fragments include any portion of the full-length polypeptide that confers a biological function on the variant gene product, including ligand binding and antibody binding. Ligand binding includes binding by nucleic acids, proteins or polypeptides, small biologically active molecules or large cellular structures.

Antibodies

Another type of probe is a peptide or protein, for example, an antibody or antibody fragment that specifically or preferentially binds to a polypeptide expressed by a particular form of a gene as characterized by the presence or absence of at least one polymorphic variant. Such antibodies may be polyclonal or monoclonal antibodies, and can be prepared by methods well-known in the art.

Antibodies can be used to probe for presence of a given polymorphism variant for those polymorphism variants that have an effect on the polypeptide encoded by the gene. For example, an antibody can recognize a change in one or more amino acid residues in the resulting protein. In some embodiments, the antibody is used to recognize polypeptides encoded by differential splice variants. If the polymorphism introduces or eliminates a surface feature of the protein such as a glycosylation site, lipid modification, etc., an antibody can also be used to identify a particular variant.

Polyclonal and/or monoclonal antibodies and antibody fragments capable of binding to a portion of the gene product relevant for identifying a given polymorphism variant are provided. Antibodies can be made by injecting mice or other animals with the variant gene product or synthetic peptide fragments thereof. Monoclonal antibodies are screened as are described, for example, in Harlow & Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1988); Goding, Monoclonal antibodies, Principles and Practice (2d ed.) Academic Press, New York (1986). Monoclonal antibodies are tested for specific immunoreactivity with a variant gene product and lack of immunoreactivity to the corresponding prototypical gene product. These antibodies are useful in diagnostic assays for detection of the variant form, or as an active ingredient in a pharmaceutical composition.

Polyclonal or monoclonal therapeutic antibodies useful in practicing this invention can be prepared in laboratory animals or by recombinant DNA techniques using the following methods. Polyclonal antibodies are raised in animals by multiple subcutaneous (sc) or intraperitoneal (ip) injections of the gene product molecule or fragment thereof in combination with an adjuvant such as Freund's adjuvant (complete or incomplete). To enhance immunogenicity, it may be useful to first conjugate the gene product molecule or a fragment containing the target amino acid sequence to a protein that is immunogenic in the species to be immunized, e.g., keyhole limpet hemocyanin, serum albumin, bovine thyroglobulin, or soybean trypsin inhibitor using a bifunctional or derivatizing agent, for example, maleimidobenzoyl sulfosuccinimide ester (conjugation through cysteine residues), N-hydroxysuccinimide (through lysine residues), glutaraldehyde, succinic anhydride, SOCl, or R¹N═C═NR, where R and R¹ are different alkyl groups. Alternatively, immunogenic conjugates can be produced recombinantly as fusion proteins.

Animals are immunized against the immunogenic conjugates or derivatives (such as a fragment containing the target amino acid sequence) by combining about 1 mg or about 1 microgram of conjugate (for rabbits or mice, respectively) with about 3 volumes of Freund's complete adjuvant and injecting the solution intradermally at multiple sites. Approximately 7 to 14 days later, animals are bled and the serum is assayed for antibody titer. Animals are boosted with antigen repeatedly until the titer plateaus. The animal can be boosted with the same molecule or fragment thereof as was used for the initial immunization, but conjugated to a different protein and/or through a different cross-linking agent. In addition, aggregating agents such as alum are used in the injections to enhance the immune response.

Monoclonal antibodies can be prepared by recovering spleen cells from immunized animals and immortalizing the cells in conventional fashion, e.g. by fusion with myeloma cells. The clones are then screened for those expressing the desired antibody. The monoclonal antibody preferably does not cross-react with other gene products.

Preparation of antibodies using recombinant DNA methods such as the phagemid display method, may be accomplished using commercially available kits, as for example, the Recombinant Phagemid Antibody System available from Pharmacia (Uppsala, Sweden), or the SurfZAP™ phage display system (Stratagene Inc., La Jolla, Calif.).

Bispecific antibodies that specifically bind to one protein and that specifically bind to other antigens relevant to pathology and/or treatment are produced, isolated, and tested using standard procedures that have been described in the literature. [See, e.g., Pluckthun & Pack, Immunotechnology, 3:83-105 (1997); Carter, et al., J. Hematotherapy, 4:463-470 (1995); Renner & Pfreundschuh, Immunological Reviews, 1995, No. 145, pp. 179-209; Pfreundschuh U.S. Pat. No. 5,643,759; Segal, et al., J. Hematotherapy, 4:377-382 (1995); Segal, et al., Immunobiology, 185:390-402 (1992); and Bolhuis, et al., Cancer Immunol. Immunother., 34: 1-8 (1991).]

Transgenic Animals

The invention further provides the making and use of transgenic nonhuman animals capable of expressing an exogenous variant gene and/or having one or both alleles of an endogenous variant gene inactivated. Expression of an exogenous variant gene is usually achieved by operably linking the gene to a promoter and optionally an enhancer, and microinjecting the construct into a zygote. [See, Hogan et al., “Manipulating the Mouse Embryo, A Laboratory Manual,” Cold Spring Harbor Laboratory.] Inactivation of endogenous variant genes can be achieved by forming a transgene in which a cloned variant gene is inactivated by insertion of a positive selection marker. [See, Capecchi, Science, 244:1288-1292 (1989).] The transgene is then introduced into an embryonic stem cell, where it undergoes homologous recombination with an endogenous variant gene. Mice and other rodents are preferred animals. Such animals provide useful drug screening systems.

The nucleic acids relevant to the invention can be used to generate genetically modified non-human animals or site specific gene modifications in cell lines. The term “transgenic” is intended to encompass genetically modified animals having a deletion or other knock-out of an endogenous gene, having an exogenous allele that is stably transmitted in the host cells, and/or having an exogenous allele promoter operably linked to a reporter gene. Transgenic animals may be made through homologous recombination, where the allele locus is altered. Alternatively, a nucleic acid construct is randomly integrated into the genome. Vectors for stable integration include plasmids, retroviruses and other animal viruses, YACs, and the like. Transgenic mammals or relevance include cows, pigs, goats, horses, etc., and particularly rodents, e.g. rats, mice, etc.

Transgenic animals can be made having exogenous genes comprising the polymorphic variants relevant to the invention so as to “humanize” the animal in respect that gene(s), such a process involves deletion of the analogous endogenous gene when appropriate. The exogenous gene is usually either from a different species than the animal host, or is otherwise altered in its coding or non-coding sequence. The introduced gene can be a wild-type gene, naturally occurring polymorphism, or a genetically manipulated sequence, for example those previously described with deletions, substitutions or insertions in the coding or non-coding regions. Where the introduced gene is a coding sequence, it usually operably linked to a promoter, which may be constitutive or inducible, and other regulatory sequences required for expression in the host animal. A detectable marker, such as lac Z can be introduced together with the exogenous gene to demonstrate incorporation of the exogenous gene.

The modified cells or animals are useful in the study of the physiological effect, if any, of the polymorphic variant. Animals can be used in functional studies, drug screening, etc., for example, to determine the effect of a candidate drug. By providing expression of a polymorphic variant in cells in which it is otherwise not normally produced, one can induce changes in cell behavior. Transgenic animals are also useful as part of a preclinical program.

DNA constructs for homologous recombination can comprise at least a portion of the polymorphic variant with the desired genetic modification, and can include regions of homology to the target locus. DNA constructs for random integration need not include regions of homology to mediate recombination. Conveniently, markers for positive and negative selection can be included. Methods for generating cells having targeted gene modifications through homologous recombination are known in the art. For various techniques for transfecting mammalian cells, see Keown et al., Methods in Enzymology 185:527-537 (1990).

Screening Techniques for Identifying Polymorphic Variants

The molecules and probes relevant to the invention can be used in screening techniques. A variety of screening techniques are known in the art for detecting the presence of one or more copies of one or more polymorphic variants in a sample or from a subject. Many of these assays have been reviewed by Landegren et al., Genome Res., 8:769-776, 1998. Determination of polymorphic variants within a particular nucleotide sequence among a population can be determined by any method known in the art, for example and without limitation, direct sequencing, restriction length fragment polymorphism (RFLP), single-strand conformational analysis (SSCA), denaturing gradient gel electrophoresis (DGGE) [see, e.g., Van Orsouw et al., Genet Anal., 14(5-6):205-13 (1999)], heteroduplex analysis (HET) [see, e.g., Ganguly A, et al., Proc. Natl. Acad. Sci. (USA), 90(21):10325-9 (1993)], chemical cleavage analysis (CCM) [see, e.g., Ellis T P, et al., Human Mutation, 11(5):345-53 (1998)] (either enzymatic as with T4 Endonuclease 7, or chemical as with osmium tetroxide and hydroxylamine) and ribonuclease cleavage. Screening for polymorphic variants can be performed when a polymorphic variant is already known to be associated with a particular disease or condition. In some embodiments, the screening is performed in pursuit of identifying one or more polymorphic variants and determining whether they are associated with a particular disease or condition.

In respect to DNA, polymorphic variant screening can include genomic DNA screening and/or cDNA screening. Genomic polymorphic variant detection can include screening the entire genomic segment spanning the gene from the transcription start site to the polyadenylation site. In some embodiments, genomic polymorphic variant detection can include the exons and some region around them containing the splicing signals, for example, but not all of the intronic sequences. In addition to screening introns and exons for polymorphic variants, regulatory DNA sequences can be screened for polymorphic variants. Promoter, enhancer, silencer and other regulatory elements have been described in human genes. The promoter is generally proximal to the transcription start site, although there may be several promoters and several transcription start sites. Enhancer, silencer and other regulatory elements can be intragenic or can lie outside the introns and exons, possibly at a considerable distance, such as 100 kb away. Polymorphic variants in such sequences can affect basal gene expression or regulation of gene expression.

The presence or absence of the at least one polymorphic variant can be determined by nucleotide sequencing. Sequencing can be carried out by any suitable method, for example, dideoxy sequencing [Sanger et al., Proc. Natl. Acad. Sci. (USA), 74:5463-5467 (1977)], chemical sequencing [Maxam and Gilbert, Proc. Natl. Acad. Sci. (USA), 74:560-564, (1977)] or variations thereof. Methods for sequencing can also be found in Ausubel et al., eds., Short Protocols in Molecular Biology, 0.3rd ed., Wiley, 1995 and Sambrook et al., Molecular Cloning, 2nd ed., Ch. 13, Cold Spring Harbor Laboratory Press, 1989. The sequencing can involve sequencing of a portion or portions of a gene and/or portions of a plurality of genes that includes at least one polymorphic variant site, and can include a plurality of such sites. The portion can be of sufficient length to discern whether the polymorphic variant(s) of interest is present. In some embodiments the portion is 500, 250, 100, 75, 65, 50, 45, 35, 25 nucleotides or less in length. Sequencing can also include the use of dye-labeled dideoxy nucleotides, and the use of mass spectrometric methods. Mass spectrometric methods can also be used to determine the nucleotide present at a polymorphic variant site.

RFLP analysis is useful for detecting the presence of genetic variants at a locus in a population when the variants differ in the size of a probed restriction fragment within the locus, such that the difference between the variants can be visualized by electrophoresis [see, e.g. U.S. Pat. Nos. 5,324,631 and 5,645,995]. Such differences will occur when a variant creates or eliminates a restriction site within the probed fragment. RFLP analysis is also useful for detecting a large insertion or deletion within the probed fragment. RFLP analysis is useful for detecting, for example, an Alu or other sequence insertion or deletion.

Single-strand conformational polymorphisms (SSCPs) can be detected in <220 by PCR amplicons with high sensitivity [Orita et al, Proc. Natl. Acad. Sci. (USA), 86:2766-2770, 1989; Warren et al., In: Current Protocols in Human Genetics, Dracopoli et al., eds, Wiley, 1994, 7.4.1-7.4.6.]. Double strands are first heat-denatured. The single strands are then subjected to polyacrylamide gel electrophoresis under non-denaturing conditions at constant temperature with low voltage and long run times at two different temperatures, typically 4-10° C. and 23° C., or appropriate ambient temperature. At low temperatures such as 4-10° C., the secondary structure of short single strands, the degree of intrachain hairpin formation, is sensitive to even single nucleotide changes, and can be detected as a large change in electrophoretic mobility. Polymorphisms appear as new banding patterns when the gel is stained.

SSCP is usually paired with a DNA sequencing method, because the SSCP method does not provide the nucleotide identity of polymorphic variants. One useful sequencing method, for example, is DNA cycle sequencing of radiolabeled PCR products using the Femtomole DNA cycle sequencing kit from Promega (WI) and the instructions provided with the kit. Fragments are selected for DNA sequencing based on their behavior in the SSCP assay. Single strand conformation polymorphism screening is a widely used technique for identifying and discriminating DNA fragments that differ from each other by as little as a single nucleotide. The SSCP technique can be used on genomic DNA [Orita et al. Proc. Natl. Acad. Sci. (USA), 86(8):2766-70, 1989] as well as PCR amplified DNA as well.

The basic steps of the SSCP procedure can be as follows. SSCP can be used to analyze cDNAs or genomic DNAs. If cDNA is used any suitable reverse transcriptase procedure and/or kit may be utilized such as a Superscript II kit from Life Technologies. Material for SSCP analysis can be prepared by PCR amplification of the cDNA in the presence of radiolabeled dNTP, such as dCTP. Usually the concentration of nonradioactive dCTP is dropped from 200 uM (the standard concentration for each of the four dNTPs) to about 100 uM, and .α32PdCTP is added to a concentration of about 0.1-0.3 μM. This process involves adding a 0.3-1 μl (3-10 μCi) of 32P cCTP to a 10 μl PCR reaction. In some embodiments, about 200 base pair PCR products for SSCP. In some embodiments, about 0.8-1.4 kb fragments are amplified and then several cocktails of restriction endonucleases are used to digest those into smaller fragments of about 0.1-0.3 kb, aiming to have as many fragments possible between 0.15 and 0.3 kb. In some embodiments, several different restriction enzyme digests can be performed on each set of samples, and then each of the digests can be run separately on SSCP gels. After digestion, the radiolabelled PCR products are diluted 1:5 by adding formamide load buffer (80% formamide, 1×SSCP gel buffer) and then denatured by heating to 90° C. for 10 minutes, and then allowed to renature by quickly chilling on ice. The secondary structure of the single strands influences their mobility on nondenaturing gels. Even single base differences consistently produce changes in intrastrand folding sufficient to register as mobility differences on SSCP. The resulting single strands are resolved on one or more gels, one a 5.5% acrylamide, 0.5×TBE gel, the other an 8% acrylamide, 10% glycerol, 1×TTE gel, or other appropriate gel recipe known in the art. The use of two gels provides a greater opportunity to recognize mobility differences. Both glycerol and acrylamide concentration have been shown to influence SSCP performance.

Another method for detecting polymorphic variants is the T4 endonuclease VII (T4E7) mismatch cleavage method: T4E7 specifically cleaves heteroduplex DNA containing single base mismatches, deletions or insertions. The site of cleavage is 1 to 6 nucleotides 3′ of the mismatch. The enzyme pinpoints the site of sequence variation, so that sequencing can be confined to a 25-30 nucleotide segment. The major steps in identifying sequence variations in candidate genes using T4E7 are as follows. First, 400-600 by segments are PCR amplified from a panel of DNA samples. Second, a fluorescently-labeled probe DNA is mixed with the sample DNA. Third, the samples are heated and cooled to allow the formation of heteroduplexes. Fourth, the T4E7 enzyme is added to the samples with incubation for 30 minutes at 37° C., during which cleavage occurs at sequence polymorphic variant mismatches. Fifth, the samples are run on an ABI 377 sequencing or other suitable apparatus to identify cleavage bands, which indicate the presence and location of polymorphic variants in the sequence. Sixth, a subset of PCR fragments showing cleavage is sequenced to identify the exact location and identity of each polymorphic variant. A subset of the samples containing each unique T4E7 cleavage site is selected for sequencing. DNA sequencing can, for example, be performed on ABI 377 automated DNA sequencers using BigDye chemistry and cycle sequencing. Analysis of the sequencing runs can be limited to the 30-40 bases marked by the T4E7 procedure as having the polymorphic variant.

Denaturing gradient gel electrophoresis (DGGE) can detect single base mutations based on differences in migration between homoduplexes and heteroduplexes [Myers et al., Nature, 313:495-498 (1985)]. The DNA sample to be tested is hybridized to a labeled wild type probe. The duplexes formed are then subjected to electrophoresis through a polyacrylamide gel that contains a gradient of DNA denaturant parallel to the direction of electrophoresis. Heteroduplexes formed due to single base variations are detected on the basis of differences in migration between the heteroduplexes and the homoduplexes formed.

In heteroduplex analysis (HET) [Keen et al., Trends Genet., 7:5 (1991)], genomic DNA is amplified by the polymerase chain reaction followed by an additional denaturing step that increases the chance of heteroduplex formation in heterozygous individuals. The PCR products are then separated on Hydrolink gels where the presence of the heteroduplex is observed as an additional band.

Chemical cleavage analysis (CCM) is based on the chemical reactivity of thymine (T) when mismatched with cytosine, guanine or thymine and the chemical reactivity of cytosine (C) when mismatched with thymine, adenine or cytosine [Cotton et al., Proc. Natl. Acad. Sci. (USA), 85:4397-4401 (1988)]. Duplex DNA formed by hybridization of a wild type probe with the DNA to be examined, is treated with osmium tetroxide for T and C mismatches and hydroxylamine for C mismatches. T and C mismatched bases that have reacted with the hydroxylamine or osmium tetroxide are then cleaved with piperidine. The cleavage products are analyzed by gel electrophoresis.

Ribonuclease cleavage involves enzymatic cleavage of RNA at a single base mismatch in an RNA:DNA hybrid [Myers et al., Science, 230:1242-1246 (1985)]. ³²P labeled RNA probe complementary to the wild type DNA is annealed to the test DNA and then treated with ribonuclease A. If a mismatch occurs, ribonuclease A will cleave the RNA probe and the location of the mismatch can then be determined by size analysis of the cleavage products following gel electrophoresis.

In addition to the physical methods described herein and others known to those skilled in the art, see, for example, Housman, U.S. Pat. No. 5,702,890; Housman et al., U.S. patent application Ser. No. 09/045,053, polymorphisms can be detected using computational methods, involving computer comparison of sequences from two or more different biological sources, which can be obtained in various ways, for example from public sequence databases. The term “polymorphic variant scanning” refers to a process of identifying sequence polymorphic variants using computer-based comparison and analysis of multiple representations of at least a portion of one or more genes. Computational polymorphic variant detection involves a process to distinguish true polymorphic variants from sequencing errors or other artifacts, and thus does not require perfectly accurate sequences. Such scanning can be performed in a variety of ways as known to those skilled in the art, preferably, for example, as described in U.S. patent application Ser. No. 09/300,747. The “gene” and “SNP” databases of Pubmed Entrez can also be utilized for identifying polymorphisms.

Genomic and cDNA sequences can both or in the alternative be used in identifying polymorphisms. Genomic sequences are useful where the detection of polymorphism in or near splice sites is sought, such polymorphism can be in introns, exons, or overlapping intron/exon boundaries. Nucleic acid sequences analyzed may represent full or partial genomic DNA sequences for a gene or genes. Partial cDNA sequences can also be utilized although this is less preferred. As described herein, the polymorphic variant scanning analysis can utilize sequence overlap regions, even from partial sequences. While the present description is provided by reference to DNA, for example, cDNA, some sequences can be provided as RNA sequences, for example, mRNA sequences.

Interpreting the location of the polymorphic variant in the gene depends on the correct assignment of the initial ATG of the encoded protein (the translation start site). The correct ATG can be incorrect in GenBank, but that one skilled in the art will know how to carry out experiments to definitively identify the correct translation initiation codon (which is not always an ATG). In the event of any potential question concerning the proper identification of a gene or part of a gene, due for example, to an error in recording an identifier or the absence of one or more of the identifiers, the priority for use to resolve the ambiguity is GenBank accession number, OMIM identification number, HUGO identifier, common name identifier.

Allele and genotype frequencies can be compared between cases and controls using statistical software (for example, SAS PROC NLMIXED). The odds ratios can be calculated using a log linear model by the delta method [Agresti, N.Y.: John Wiley & Sons (1990)] and statistical significance was assessed via the chi-square test. Likelihood ratios (G2) were used to assess goodness of fit of different models i.e., G2 provides a measure of the reliability of the odds ratio; small G2 P-values indicate a poor fit to the model being tested. Combined genotypes can be analysed by estimating, maximum likelihood estimation, the gamete frequencies in cases and controls using a model of the four combinations of alleles as described by Weir, Sunderland, Mass.: Sinauer (1996). Gene-gene interactive effects can be tested using a series of non-hierarchical logistic models [Piegorsch et al., Stat. Med., 13:153-162 (1994)] to estimate interactive dominant and recessive effects. A sample size as large as possible from a relatively homogenous population to minimize variables outside the focus of the study.

Genomic DNA can be extracted from cases and controls using the QIAamp DNA Blood Mini Kit from Qiagen, UK. Genotyping of polymorphisms was performed using PCR-RFLP (Restriction Fragment Length Polymorphism) using appropriate restriction sites for the gene(s) being studied [Frosst et al., Nature Genet., 10:111-113 (1995); Hol et al., Clin. Genet., 53:119-125 (1998); Brody et al., Am. J. Hum. Genet., 71:1207-1215 (2002)]. A polymorphism may be genotyped using an allele-specific primer extension assay and scored by matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry (Sequenom, San Diego). Appropriate controls should be included in all assays. genotyping consistency can be tested by analyzing between 10-15% of samples in duplicate.

One type of assay has been termed an array hybridization assay, an example of which is the multiplexed allele-specific diagnostic assay (MASDA) [U.S. Pat. No. 5,834,181; Shuber et al., Hum. Molec. Genet., 6:337-347 (1997)]. In MASDA, samples from multiplex PCR are immobilized on a solid support. A single hybridization is conducted with a pool of labeled allele specific oligonucleotides (ASO). The support is then washed to remove unhybridized ASOs remaining in the pool. Labeled ASO remaining on the support are detected and eluted from the support. The eluted ASOs are then sequenced to determine the mutation present.

Two assays depend on hybridization-based allele-discrimination during PCR. The TaqMan assay [U.S. Pat. No. 5,962,233; Livak et al., Nature Genet., 9:341-342, (1995)] uses allele specific (ASO) probes with a donor dye on one end and an acceptor dye on the other end such that the dye pair interact via fluorescence resonance energy transfer (FRET). A target sequence is amplified by PCR modified to include the addition of the labeled ASO probe. The PCR conditions are adjusted so that a single nucleotide difference will effect binding of the probe. Due to the 5′ nuclease activity of the Taq polymerase enzyme, a perfectly complementary probe is cleaved during the PCR while a probe with a single mismatched base is not cleaved. Cleavage of the probe dissociates the donor dye from the quenching acceptor dye, greatly increasing the donor fluorescence.

An alternative to the TaqMan assay is the molecular beacons assay [U.S. Pat. No. 5,925,517; Tyagi et al., Nature Biotech., 16:49-53 (1998)]. In the molecular beacons assay, the ASO probes contain complementary sequences flanking the-target specific species so that a hairpin structure is formed. The loop of the hairpin is complimentary to the target sequence while each arm of the hairpin contains either donor or acceptor dyes. When not hybridized to a donor sequence, the hairpin structure brings the donor and acceptor dye close together thereby extinguishing the donor fluorescence. When hybridized to the specific target sequence, however, the donor and acceptor dyes are separated with an increase in fluorescence of up to 900 fold. Molecular beacons can be used in conjunction with amplification of the target sequence by PCR and provide a method for real time detection of the presence of target sequences or can be used after amplification.

High throughput screening for SNPs that affect restriction sites can be achieved by Microtiter Array Diagonal Gel Electrophoresis (MADGE) [Day and Humphries, Anal. Biochem., 222:389-395, (1994)]. In this assay restriction fragment digested PCR products are loaded onto stackable horizontal gels with the wells arrayed in a microtiter format. During electrophoresis, the electric field is applied at an angle relative to the columns and rows of the wells allowing products from a large number of reactions to be resolved.

Additional assays depend on mismatch distinction by polymerases and ligases. The polymerization step in PCR places high stringency requirements on correct base pairing of the 3′ end of the hybridizing primers. This has allowed the use of PCR for the rapid detection of single base changes in DNA by using specifically designed oligonucleotides in a method variously called PCR amplification of specific alleles (PASA) [Sommer et al., Mayo Clin. Proc., 64:1361-1372 (1989); Sarkar et al., Anal. Biochem., 186:64-68 (1990), allele-specific amplification (ASA), allele-specific PCR, and amplification refractory mutation system (ARMS) [Newton et al., Nuc. Acids Res., 17:2503-16 (1989); Nichols et al., Genomics, 5:535-40 (1989); Wu et al., Proc. Natl. Acad. Sci. (USA), 86:2757-60 (1989)]. In these methods, an oligonucleotide primer is designed that perfectly matches one allele but mismatches the other allele at or near the 3′ end. This results in the preferential amplification of one allele over the other. By using three primers that produce two differently sized products, it can be determine whether an individual is homozygous or heterozygous for the imitation [Dutton and Sommer, Bio Techniques, 11:700-702 (1991)]. In another method, termed bi-PASA, four primers are used; two outer primers that bind at different distances from the site of the SNP and two allele specific inner primers [Liu et al., Genome Res., 7:389-398 (1997)]. Each of the inner primers have a non-complementary 5′ end and form a mismatch near the 3′ end if the proper allele is not present. Using this system, zygosity is determined based on the size and number of PCR products produced.

The joining by DNA ligases of two oligonucleotides hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, especially at the 3′ end. This sensitivity has been utilized in the oligonucleotide ligation assay [Landegren et al., Science, 241:1077-1080 (1988)] and the ligase chain reaction [LCR; Barany, Proc. Natl. Acad. Sci. (USA), 88:189-193 (1991)]. In OLA, the sequence surrounding the SNP is first amplified by PCR, whereas in LCR, genomic DNA can by used as a template.

In one method for mass screening based on the OLA, amplified DNA templates are analyzed for their ability to serve as templates for ligation reactions between labeled oligonucleotide probes [Samotiaki et al., Genomics, 20:238-242, (1994)]. In this assay, two allele-specific probes labeled with either of two lanthanide labels (europium or terbium) compete for ligation to a third biotin labeled phosphorylated oligonucleotide and the signals from the allele specific oligonucleotides are compared by time-resolved fluorescence. After ligation, the oligonucleotides are collected on an avidin-coated 96-pin capture manifold. The collected oligonucleotides are then transferred to microtiter wells in which the europium and terbium ions are released. The fluorescence from the europium ions is determined for each well, followed by measurement of the terbium fluorescence.

In alternative gel-based OLA assays, polymorphic variants can be detected simultaneously using multiplex PCR and multiplex ligation [U.S. Pat. No. 5,830,711; Day et al., Genomics, 29:152-162 (1995); Grossman et al., Nuc. Acids Res., 22:4527-4534, (1994)]. In these assays, allele specific oligonucleotides with different markers, for example, fluorescent dyes, are used. The ligation products are then analyzed together by electrophoresis on an automatic DNA sequencer distinguishing markers by size and alleles by fluorescence. In the assay by Grossman et al., 1994, mobility is further modified by the presence of a non-nucleotide mobility modifier on one of the oligonucleotides.

A further modification of the ligation assay has been termed the dye-labeled oligonucleotide ligation (DOL) assay [U.S. Pat. No. 5,945,283; Chen et al., Genome Res., 8:549-556 (1998)]. DOL combines PCR and the oligonucleotide ligation reaction in a two-stage thermal cycling sequence with fluorescence resonance energy transfer (FRET) detection. In the assay, labeled ligation oligonucleotides are designed to have annealing temperatures lower than those of the amplification primers. After amplification, the temperature is lowered to a temperature where the ligation oligonucleotides can anneal and be ligated together. This assay uses a thermostable ligase and a thermostable DNA polymerase without 5′ nuclease activity. Because FRET occurs only when the donor and acceptor dyes are in close proximity, ligation is inferred by the change in fluorescence.

In another method for the detection of polymorphic variants termed minisequencing, the target-dependent addition by a polymerase of a specific nucleotide immediately downstream (3′) to a single primer is used to determine which allele is present (U.S. Pat. No. 5,846,710). Using this method, several variants can be analyzed in parallel by separating locus specific primers on the basis of size via electrophoresis and determining allele specific incorporation using labeled nucleotides.

Determination of individual variants using solid phase minisequencing has been described by [Syvanen et al., Am. J. Hum. Genet., 52:46-59 (1993)]. In this method the sequence including the polymorphic site is amplified by PCR using one amplification primer which is biotinylated on its 5′ end. The biotinylated PCR products are captured in streptavidin-coated microtitration wells, the wells washed, and the captured PCR products denatured. A sequencing primer is then added whose 3′ end binds immediately prior to the polymorphic site, and the primer is elongated by a DNA polymerase with one single labeled dNTP complementary to the nucleotide at the polymorphic site. After the elongation reaction, the sequencing primer is released and the presence of the labeled nucleotide detected. Alternatively, dye labeled dideoxynucleoside triphosphates (ddNTPs) can be used in the elongation reaction [U.S. Pat. No. 5,888,819; Shumaker et al., Human Mut., 7:346-354, (1996)]. In this method, incorporation of the ddNTP is determined using an automatic gel sequencer.

Minisequencing has also been adapted for use with microarrays [Shumaker et al., Human Mut., 7:346-354 (1996)]. In this case, elongation (extension) primers are attached to a solid support such as a glass slide. Methods for construction of oligonucleotide arrays are well known to those of ordinary skill in the art and can be found, for example, in Nature Genetics, Suppl., Jan. 21, 1999. PCR products are spotted on the array and allowed to anneal. The extension (elongation) reaction is carried out using a polymerase, a labeled dNTP and noncompeting ddNTPs. Incorporation of the labeled DNTP is then detected by the appropriate means. In a variation of this method suitable for use with multiplex PCR, extension is accomplished with the use of the appropriate labeled ddNTP and unlabeled ddNTPs [Pastinen et al., Genome Res., 7:606-614 (1997)].

Solid phase minisequencing has also been used to detect multiple polymorphic nucleotides from different templates in an undivided sample [Pastinen et al., Clin. Chem., 42:1391-1397 (1996)]. In this method, biotinylated PCR products are captured on the avidin-coated manifold support and rendered single stranded by alkaline treatment. The manifold is then placed serially in four reaction mixtures containing extension primers of varying lengths, a DNA polymerase and a labeled ddNTP, and the extension reaction allowed to proceed. The manifolds are inserted into the slots of a gel containing formamide which releases the extended primers from the template. The extended primers are then identified by size and fluorescence on a sequencing instrument.

Fluorescence resonance energy transfer (FRET) has been used in combination with minisequencing to detect polymorphic variants [U.S. Pat. No. 5,945,283; Chen et al., Proc. Natl. Acad. Sci. USA, 94:10756-10761 (1997)]. In this method, the extension primers are labeled with a fluorescent dye, for example fluorescein. The ddNTPs used in primer extension are labeled with an appropriate FRET dye. Incorporation of the ddNTPs is determined by changes in fluorescence intensities.

The above discussion of methods for the detection of SNPs is exemplary only and is not intended to be exhaustive. Those of ordinary skill in the art will be able to envision other methods for detection of polymorphic variants that are within the scope and spirit of the invention.

Polymorphisms are detected in a target nucleic acid from an individual being analyzed. For assay of genomic DNA, virtually any biological sample other than pure red blood cells is suitable. “Tissue” means any sample taken from any subject, preferably a human. For example, convenient tissue samples include whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal epithelium, skin and hair. For assay of cDNA or mRNA, the tissue sample should be obtained from an organ in which the target nucleic acid is expressed.

Many of the methods described involve amplification of DNA from target samples. This can be accomplished by e.g., PCR. Other suitable amplification methods include the ligase chain reaction (LCR) [see Wu and Wallace, Genomics, 4:560 (1989), Landegren et al., Science, 241:1077 (1988)], transcription amplification [Kwoh et al., Proc. Natl. Acad. Sci. (USA), 86:1173 (1989)], self-sustained sequence replication [Guatelli et al., Proc. Nat. Acad. Sci. (USA), 87:1874 (1990)] and nucleic acid based sequence amplification (NASBA). The latter two amplification methods involve isothermal reactions based on isothermal transcription, which produce both single stranded RNA (ssRNA) and double stranded DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, respectively.

Single base extension methods are described by e.g., U.S. Pat. No. 5,846,710, U.S. Pat. No. 6,004,744, U.S. Pat. No. 5,888,819 and U.S. Pat. No. 5,856,092. Generally, the methods work by hybridizing a primer that is complementary to a target sequence such that the 3′ end of the primer is immediately adjacent to, but does not span a site of, potential variation in the target sequence. That is, the primer comprises a subsequence from the complement of a target polynucleotide terminating at the base that is immediately adjacent and 5′ to the polymorphic site. The term primer refers to a single-stranded oligonucleotide capable of acting as a point of initiation of template-directed DNA synthesis under appropriate conditions (i.e., in the presence of four different nucleoside triphosphates and an agent for polymerization, such as DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer but typically ranges from 15 to 40 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact sequence of the template but should be sufficiently complementary to hybridize with a template. The term primer site refers to the area of the target DNA to which a primer hybridizes. The term primer pair means a set of primers including a 5′ upstream primer that hybridizes with the 5′ end of the DNA sequence to be amplified and a 3′, downstream primer that hybridizes with the complement of the 3′ end of the sequence to be amplified. Hybridization probes are capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include nucleic acids and peptide nucleic acids as described in [Nielsen et al., Science, 254:1497-1500 (1991)]. A probe primer can be labeled, if desired, by incorporating a label detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include ³²P, fluorescent dyes, electron dense reagents, enzymes (as commonly used in an ELISA), biotin, or haptens and proteins for which antisera or monoclonal antibodies are available. A label can also be used to “capture” the primer, so as to facilitate the immobilization of either the primer or a primer extension product, such as amplified DNA, on a solid support. The hybridization is performed in the presence of one or more labeled nucleotides complementary to base(s) that may occupy the site of potential variation. For example, for biallelic polymorphisms, two differentially labeled nucleotides can be used. For tetraallelic polymorphisms, four differentially-labeled nucleotides can be used. In some methods, particularly methods employing multiple differentially labeled nucleotides, the nucleotides are dideoxynucleotides. Hybridization is performed under conditions permitting primer extension if a nucleotide complementary to a base occupying the site of variation if the target sequence is present. Extension incorporates a labeled nucleotide thereby generating a labeled extended primer. If multiple differentially-labeled nucleotides are used and the target is heterozygous then multiple differentially-labeled extended primers can be obtained. Extended primers are detected providing an indication of which base(s) occupy the site of variation in the target polynucleotide.

An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and only primes amplification of an allelic form to which the primer exhibits perfect complementarily. [See Gibbs, Nucleic Acid Res., 17:2427-2448 (1989).] This primer is used in conjunction with a second primer that hybridizes at a distal site. Amplification proceeds from the two primers leading to a detectable product signifying that the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarily to a distal site. The single-base mismatch prevents amplification and no detectable product is formed. In some methods, the mismatch is included in the 3′-most position of the oligonucleotide aligned with the polymorphism because this position is most destabilizing to elongation from the primer. [See, e.g., WO 93/22456.] In other methods, a double-base mismatch is used in which the first mismatch is included in the 3′-most position of the oligonucleotide aligned with the polymorphism and a second mismatch is positioned at the immediately adjacent base (the pen-ultimate 3′ position). This double mismatch further prevents amplification in instances in which there is no match between the 3′ position of the primer and the polymorphism.

Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. [Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, (W. H. Freeman and Co, New York, (1992)), Chapter 7.]

Arrays provide a high throughput technique that can assay a large number of polynucleotides in a sample. In one aspect of the invention, an array is constructed comprising one or more of the genes, proteins or antibodies relevant to the invention, comprising one or more of these sequences. This technology can be used as a tool to test for differential expression, or for genotyping. Arrays can be created by spotting polynucleotide probes onto a substrate (e.g., glass, nitrocellulose, etc.) in a two-dimensional matrix or array having bound probes. The probes can be bound to the substrate by either covalent bonds or by non-specific interactions, such as hydrophobic interactions. Techniques for constructing arrays and methods of using these arrays are described in, for example, Schena et al., Proc. Natl. Acad. Sci. (USA), 93(20):10614-9 (1996); Schena et al., Science, 270(5235):467-70 (1995); Shalon et al., Genome Res., 6(7):639-45 (1996), U.S. Pat. No. 5,807,522, EP 799 897; WO 97/29212; WO 97/27317; EP 785 280; WO 97/02357; U.S. Pat. No. 5,593,839; U.S. Pat. No. 5,578,832; EP 728 520; U.S. Pat. No. 5,599,695; EP 721 016; U.S. Pat. No. 5,556,752; WO 95/22058; and U.S. Pat. No. 5,631,734.

In some embodiments, an array comprises probes specific for one or more allelic variants for a given gene. Probes that specifically bind to the allele of interest can be used, and reaction conditions for hybridization to the array can be adjusted accordingly. The probes utilized in the arrays can be of varying types and can include, for example, synthesized probes of relatively short length (e.g., a 20-mer or a 25-mer), cDNA (full length or fragments of gene), amplified DNA, fragments of DNA (generated by restriction enzymes, for example) and reverse transcribed RNA. Both custom and generic arrays can be utilized in detecting differential expression levels. Custom arrays can be prepared using probes that hybridize to particular preselected subsequences of mRNA gene sequences or amplification products prepared from them. Many variations on methods of detection using arrays are within the skill in the art and within the scope of the invention. For example, rather than immobilizing the probe to a solid support, the test sample can be immobilized on a solid support that is then contacted with the probe.

Screening may also be based on the functional or antigenic characteristics of the protein. Immunoassays designed to detect predisposing polymorphisms in proteins relevant to the invention can be used in screening. Antibodies specific for a polymorphism variant or gene products may be used in screening immunoassays. A sample is taken from a subject. Samples, as used herein, include biological fluids such as tracheal lavage, blood, cerebrospinal fluid, tears, saliva, lymph, dialysis fluid and the like; organ or tissue culture derived fluids; and fluids extracted from physiological tissues. Samples can also include derivatives and fractions of such fluids. In some embodiments, the sample is derived from a biopsy. The number of cells in a sample will generally be at least about 10³, usually at least 10⁴ more usually at least about 10⁵. The cells can be dissociated, in the case of solid tissues, or tissue sections may be analyzed. Alternatively a lysate of the cells can be prepared.

In some embodiments, detection utilizes staining of cells or histological sections, performed in accordance with conventional methods. The antibodies of interest are added to the cell sample, and incubated for a period of time sufficient to allow binding to the epitope, usually at least about 10 minutes. The antibody may be labeled with radioisotopes; enzymes, fluorescers, chemiluminescers, or other labels for direct detection. Alternatively or in addition, a second stage antibody or reagent can be used to amplify the signal. For example, the primary antibody can be conjugated to biotin, with horseradish peroxidase-conjugated avidin added as a second stage reagent. Final detection uses a substrate that undergoes a color change in the presence of the peroxidase. The absence or presence of antibody binding may be determined by various methods, including flow cytometry of dissociated cells, microscopy, radiography, scintillation counting, etc.

An alternative method for diagnosis depends on the in vitro detection of binding between antibodies and protein encoded by the polymorphic variant in a lysate. Measuring the concentration of protein binding in a sample or fraction thereof can be accomplished by a variety of specific assays. A conventional sandwich type assay may be used. For example, a sandwich assay can first attach polymorphic variant protein specific antibodies to an insoluble surface or support. The particular manner of binding is not crucial so long as it is compatible with the reagents and overall methods of the invention. Binding may be covalent or non-covalent.

Other immunoassays are known in the art and may find use as diagnostics. Ouchterlony plates provide a simple determination of antibody binding. Western blots can be performed on protein gels or protein spots on filters, using a detection system specific for polymorphic variant protein as desired, conveniently using a labeling method as described for the sandwich assay.

The invention provides a method for determining a genotype of an individual in relation to one or more polymorphic variants in one or more of the genes identified in above aspects by using mass spectrometric determination of a nucleic acid sequence that is a portion of a gene identified for other aspects of this invention or a complementary sequence. Such mass spectrometric methods are known to those skilled in the art. In preferred embodiments, the method involves determining the presence or absence of a polymorphic variant in a gene; determining the nucleotide sequence of the nucleic acid sequence; the nucleotide sequence is 100 nucleotides or less in length, preferably 50 or less, more preferably 30 or less, and still more preferably 20 nucleotides or less. In general, such a nucleotide sequence includes at least one polymorphic variant site, preferably a polymorphic variant site which is informative with respect to the expected response of a patient to a treatment as described for above aspects.

Therapies

The invention provides methods for choosing a relevant therapeutic strategy based on the detection of one or more polymorphic variants. In some embodiments, the polymorphic variant indicates an altered susceptibility to a particular disease state. In embodiments, where the variant is associated with an increased susceptibility for that disease state. In some embodiments, for the MTHFD1 1958A variant, a resulting diagnosis for an increased susceptibility for a pregnancy complication such as severe placental abruption or a second trimester miscarriage would indicate a therapy that helps minimize or eliminate such complications. In some embodiments, for the MTHFD1L “ATT” seven repeat intron variant of rs3832406, a resulting diagnosis for an increased susceptibility for a pregnancy complication such as a neural tube defect indicates a therapy that helps minimize or eliminate such complications. In some embodiments, for the MTHFD1L “ATT” seven repeat intron variant of rs3832406, a resulting diagnosis for an increased susceptibility for a cancer drug complication would indicate that helps minimize such a complication. Accordingly, the invention provides a method for determining whether a compound has a differential effect due to the presence or absence of at least one polymorphic variant in a gene or a variant form of a gene. In some embodiments, the method comprises identifying a subset of patients with enhanced or diminished response or tolerance to a treatment method or a method of administration of a treatment where the treatment is for a disease or condition in the patient. General methods of testing effects of a polymorphic variant for an effect on drug efficacy are known to those of skill in the art and are provided in various sources such as U.S. Pat. Nos. 6,537,759; 6,664,062; and 6,759,200.

One or more polymorphic variants in one or more genes in a plurality of patients can be correlated with response to a particular treatment such as a drug or more specifically a drug regimen including dosage, administration, and other relevant parameters. The correlation can be performed by determining the one or more polymorphic variants in the one or more genes in the plurality of patients and correlating the presence or absence of each of the polymorphic variants (alone or in various combinations) with the patient's response to a particular treatment. The polymorphic variants can be previously known to exist or can also be determined de novo, or combinations of prior information and newly determined information may be used. The enhanced or diminished response should be statistically significant, preferably such that p=0.10 or less, more preferably 0.05 or less, and most preferably 0.02 or less. A positive correlation between the presence of one or more polymorphic variants and an enhanced response to treatment is indicative that the treatment is particularly effective in the group of patients having those polymorphic variants. A positive correlation of the presence of the one or more polymorphic variants with a diminished response to the treatment is indicative that the treatment will be less effective in the group of patients having those polymorphic variants. Such information is useful, for example, for selecting or de-selecting patients for a particular treatment or method of administration of a treatment, or for demonstrating that a group of patients exists for which the treatment or method of treatment would be particularly beneficial or contra-indicated. Such demonstration can be beneficial, for example, for obtaining government regulatory approval for a new drug or a new use of a drug.

In some embodiments, a first patient or set of patients suffering from a disease or condition are identified whose response to a treatment differs from the response (to the same treatment) of a second patient or set of patients suffering from the same disease or condition, and then determining whether the frequency of at least one polymorphic variant in at least one gene differs in frequency between the first patient or set of patients and the second patient or set of patients. A correlation between the presence or absence of the polymorphic variant or polymorphic variants and the response of the patient or patients to the treatment indicates that the polymorphic variant provides information about variable patient response. The method can involve identifying at least one polymorphic variant in at least one gene. In some embodiments, a first patient or set of patients suffering from a disease or condition and having a particular genotype, haplotype or combination of genotypes or haplotypes is identified, and a second patient or set of patients suffering from the same disease or condition that have a genotype or haplotype or sets of genotypes or haplotypes that differ in a specific way from those of the first set of patients is identified. The extent and magnitude of clinical response can be compared between the first patient or set of patients and the second patient or set of patients. A correlation between the presence or absence of a polymorphic variant or polymorphic variants or haplotypes and the response of the patient or patients to the treatment indicates that the polymorphic variant provides information about variable patient response and is useful for the invention.

Polymorphic variants of relevance include those that can affect one or more of: the susceptibility of individuals to a disease; the course or natural history of a disease; and the response of a patient with a disease to a medical intervention, such as, for example, a drug, a biologic substance, physical energy such as radiation therapy, or a specific dietary regimen. Variation in any of these three parameters can constitute the basis for initiating a pharmacogenetic study directed to the identification of the genetic sources of interpatient variation. The effect of a DNA sequence polymorphic variant or polymorphic variants on disease susceptibility or natural history are of particular interest as the polymorphic variants can be used to define patient subsets that behave differently in response to medical interventions. Useful gene sequence polymorphic variants for this invention can be described as polymorphic variants that partition patients into two or more groups that respond differently to a therapy, regardless of the reason for the difference, and regardless of whether the reason for the difference is known.

Once the presence or absence of a polymorphic variant or polymorphic variants in a gene or genes is shown to correlate with the efficacy or safety of a treatment method, that information can be used to select an appropriate treatment method for a particular patient. In the case of a treatment which is more likely to be effective when administered to a patient who has at least one copy of a gene with a particular polymorphic variant or polymorphic variants (in some cases the correlation with effective treatment is for patients who are homozygous for polymorphic variant or set of polymorphic variants in a gene) than in patients with a different polymorphic variant or set of polymorphic variants, a method of treatment is selected (and/or a method of administration) which correlates positively with the particular polymorphic variant presence or absence which provides the indication of effectiveness. Such selection can involve a variety of different choices, and the correlation can involve a variety of different types of treatments, or choices of methods of treatment. In some cases, the selection can include choices between treatments or methods of administration where more than one method is likely to be effective, or where there is a range of expected effectiveness or different expected levels of contra-indication or deleterious effects. In such cases, the selection can be performed to select a treatment that will be as effective or more effective than other methods, while having a comparatively low level of deleterious effects. Similarly, where the selection is between methods with differing levels of deleterious effects, preferably a method is selected that has low such effects but that is expected to be effective in the patient. Alternatively, in cases where the presence or absence of the particular polymorphic variant or polymorphic variants is indicative that a treatment or method of administration is more likely to be ineffective or contra-indicated in a patient with that polymorphic variant or polymorphic variants, then such treatment or method of administration is generally eliminated for use in that patient.

The term “therapy” refers to a process that is intended to produce a beneficial change in the condition of a mammal, for example, a human, often referred to as a patient. A beneficial change can include one or more of: restoration of function; reduction of symptoms; limitation or retardation of progression of a disease; disorder; or condition or prevention, limitation or retardation of deterioration of a patient's condition; disease or disorder. Such therapy can involve nutritional modifications, administration of radiation, administration of a drug, behavioral modifications and combinations of these, among others.

The terms “inhibit,” “prevent,” and “treat,” as well as words stemming therefrom, as used herein, do not necessarily imply 100% or complete inhibition, prevention, or treatment. Rather, there are varying degrees of inhibition, prevention, or treatment of which one of ordinary skill in the art recognizes as having a potential benefit or therapeutic effect. In this respect, the present inventive methods can provide any amount of inhibition of metastasis of a cancer cell, any level of prevention of metastasis of a cancer cell of cancer, or any degree of treatments of a cancer in a subject. The term “patient” refers to both human and veterinary subjects. The term “subject” or “individual” typically refers to humans, but also to mammals and other animals, multicellular organisms such as plants, and single-celled organisms or viruses.

If a given polymorphism variant correlates with an increased the expression level or activity of the protein encoded by the variant, the complications associated with the variant can be treated by administering an antagonist of the protein. If a given polymorphism variant correlates with a complication involving decrease in the expression level or activity of the protein encoded by the variant, the complications can be treated by administering the protein itself, a nucleic acid encoding the protein that can be expressed in a patient, or an analog or agonist of the protein. In the case of pregnancy complications and polymorphism variants of genes encoding enzymes involved in a one carbon metabolic pathway such as MTHFD1, MTHFD1L, and MTHFR, folate, Vitamin B₁₂, and/or other B vitamins are administered to the woman subject who is pregnant or planning a pregnancy. Other treatments can include, but are not limited to, surgery, the administration of pharmaceutical compounds or nutritional supplements, and behavioral changes such as improved diet, increased exercise, reduced alcohol intake, smoking cessation, etc.

The invention comprises a method for determining a method of treatment effective to treat a disease or condition by altering the level of activity of a product of an allele of a gene selected from the genes described herein, and determining whether that alteration provides a differential effect related to reducing or alleviating a disease or condition as compared to at least one alternative allele or an alteration in toxicity or tolerance of the treatment by a patient or patients. The presence of such a differential effect indicates that altering that level of activity provides at least part of an effective treatment for the disease or condition.

Information gained from analyzing genetic material for the presence of polymorphisms can be used to design treatment regimes involving gene therapy. For example, detection of a polymorphism that either affects the expression of a gene or results in the production of a mutant protein can be used to design an artificial gene to aid in the production of normal, wild type protein or help restore normal gene expression. Once designed, the gene can be placed in the individual by any suitable means known in the art. [Gene Therapy Technologies, Applications and Regulations, Meager, ed., Wiley (1999); Gene Therapy Principles and Applications, Blankenstein, ed., Birkhauser Verlag (1999); Jain, Textbook of Gene Therapy, Hogrefe and Huber (1998)].

There are several methods that can be used for assessing the medical and pharmaceutical implications of a polymorphic variant include computational methods, in vitro and/or in vivo experimental methods, prospective human clinical trials, and other laboratory and clinical measures. Informatics-based approaches include DNA and protein sequence analysis, such as phylogenetic approaches and motif searching, and protein modeling. Tools are available for modeling the structure of proteins with unsolved structure, particularly if there is a related protein with known structure. [Rost et al., J. Mol. Biol., 270:471-480 (1997); Firestine et al., Chem. Biol., 3:779-783 (1996)]. Methods are also available for identifying conserved domains and vital amino acid residues of proteins of unknown structure by analysis of phylogenetic relationships. [Deleage et al., Biochimie, 79:681-686 (1997); Taylor et al., Protein Sci., 3:1858-1870 (1994).] These methods can permit the prediction of functionally important polymorphic variants, either on the basis of structure or evolutionary conservation. Phylogenetic approaches to understanding sequence variation can also be used. If a sequence polymorphic variant occurs at a nucleotide or encoded amino acid residue where there is usually little or no variation in homologs of the protein of interest from non-human species, particularly evolutionarily remote species, then the polymorphic variant can be more likely to affect function of the RNA or protein.

Clinical Trial

A clinical trial can be used to evaluate differential efficacy of or tolerance to a treatment in a subset of patients who have a particular polymorphic variant or polymorphic variants in at least one gene. A “clinical trial” is the testing of a therapeutic intervention in a volunteer human population for the purpose of determining whether a therapeutic intervention is safe and/or efficacious in the human volunteer or patient population for a given disease, disorder, or condition. Clinical trials can comprise Phase I, II, III, or IV trials. In general, the polymorphisms relevant to the invention are useful for conducting clinical trials of drug candidates for the disease state, conditions and complications of the invention. Such trials can be performed on treated or control populations having similar or identical polymorphic profiles at a defined collection of polymorphic sites. Use of genetically matched populations eliminates or reduces variation in treatment outcome due to genetic factors, leading to a more accurate assessment of the efficacy of a potential drug. In some embodiments, the set of polymorphisms may be used to stratify the enrolled patients into disease sub-types or classes. In some embodiments, the polymorphisms are used to identify subsets of patients with similar polymorphic profiles who have an unusually high or low response to treatment or who do not respond at all. Information about the underlying genetic factors influencing response to treatment can be used in many aspects of the development of treatments, such as identification of new targets, through the design of new trials, product labeling, and patient targeting. Additionally, the polymorphisms can be used to identify the genetic factors involved in adverse response to treatment.

Diagnostic tests for a specific polymorphic variant or variant form of a gene can be incorporated in the clinical trial protocol as inclusion or exclusion criteria for enrollment in the trial, to allocate certain patients to treatment or control groups within the clinical trial or to assign patients to different treatment cohorts. In some embodiments, diagnostic tests for specific polymorphic variants are performed on all patients within a clinical trial, and statistical analysis performed comparing and contrasting the efficacy or safety of a drug between individuals with different polymorphic variants or variant forms of the gene or genes. Diagnostic tests for polymorphic variants can be performed on groups of patients known to have efficacious responses to the drug to identify differences in the frequency of polymorphic variants between responders and non-responders. In some embodiments, diagnostic tests for polymorphic variants are performed on groups of patients known to have toxic responses to the drug to identify differences in the frequency of the polymorphic variant between those having adverse events and those not having adverse events. Such outlier analyses are useful if a limited number of patient samples are available for analysis. Embodiments involving clinical trials include the genetic stratification strategies, phases, statistical analyses, sizes, and other relevant parameters.

Prior to establishment of a diagnostic test for use in the selection of a treatment method or elimination of a treatment method, the presence or absence of one or more specific polymorphic variants in a gene or in multiple genes is correlated with a differential treatment response. Such a differential response can be determined using prospective and/or retrospective data. The determination can be performed by analyzing the presence or absence of particular polymorphic variants in patients who have previously been treated with a particular treatment method, and correlating the polymorphic variant presence or absence with the observed course, outcome, and/or development of adverse events in those patients. Alternatively, the analysis can be performed prospectively, where the presence or absence of the polymorphic variant or polymorphic variants in an individual is determined and the course, outcome, and/or development of adverse events in those patients is subsequently or concurrently observed and then correlated with the polymorphic variant determination.

General methods for performing clinical trials are well known in the art. [Guide to Clinical Trials by Bert Spilker, Raven Press, 1991; The Randomized Clinical Trial and Therapeutic Decisions by Niels Tygstrup (Editor), Marcel Dekker; Recent Advances in Clinical Trial Design and Analysis (Cancer Treatment and Research, Ctar 75) by Peter F. Thall (Editor) Kluwer Academic Pub, 1995.] Additional design considerations include defining what the genetic hypothesis is, how it is to be tested, how many patients will need to be enrolled to have adequate statistical power to measure an effect of a specified magnitude, definition of primary and secondary endpoints, and methods of statistical analysis. The design of the trial can incorporate the preclinical data sets to determine the primary and secondary endpoints. Endpoints can include whether the therapeutic intervention is efficacious, efficacious with undesirable side effects, ineffective, ineffective with undesirable side effects, or ineffective with deleterious effects. Pharmacoeconomic analyses can be incorporated in order to support the efficacious intervention, efficacious with undesirable side effects cases, whereby the clinical outcome is positive, and economic analyses are carried out for the support of overall benefit to the patient and to society. The strategies for designing a clinical trial to test the effect of a genotypic polymorphic variant or polymorphic variants can be modified based upon the data and information from the preclinical studies and the patient symptomatic parameters unique to the target indication.

A clinical trial in which pharmacogenetic related efficacy or toxicity endpoints are included in the primary or secondary endpoints can be part of a retrospective or prospective clinical trial. In the design of these trials, the allelic differences is identified and stratification based upon these genotypic differences among patient or subject groups are used to ascertain the significance of the impact a genotype has on the candidate therapeutic intervention. Retrospective pharmacogenetic trials can be conducted at each of the phases of clinical development, with the assumption that sufficient data is available for the correlation of the physiologic effect of the candidate therapeutic intervention and the allelic polymorphic variant or polymorphic variants within the treatment population. In the case of a retrospective trial, the data collected from the trial can be re-analyzed by imposing the additional stratification on groups of patients by specific allelic polymorphic variants that may exist in the treatment groups. Retrospective trials can be useful to ascertain whether a hypothesis that a specific polymorphic variant has a significant effect on the efficacy or toxicity profile for a candidate therapeutic intervention. Retrospective or prospective human clinical trials are performed to test whether the identified allelic polymorphic variant, polymorphic variants, or haplotypes or combination thereof influence the efficacy or toxicity profiles for a given drug or other therapeutic intervention.

In designing a pharmacogenetic trial, retrospective analysis of Phase II or Phase III clinical data can indicate trial variables for which further analysis should be obtained. A placebo controlled pharmacogenetics clinical trial design can be one in which target allelic polymorphic variant or polymorphic variants is identified and a diagnostic test is performed to stratify the patients based upon presence, absence, or combination thereof of these polymorphic variants. In the Phase II or phase III stage of clinical development, determination of a specific sample size of a prospective trial is described to include factors such as expected differences between a placebo and treatment on the primary or secondary endpoints and a consideration of the allelic frequencies.

A prospective clinical trial has the advantage that the trial can be designed to ensure the trial objectives can be met with statistical certainty. In these cases, power analysis, which includes the parameters of allelic polymorphic variant frequency, number of treatment groups, and ability to detect positive outcomes can ensure that the trial objectives are met.

The design of a pharmacogenetics clinical trial can include a description of the allelic polymorphic variant impact on the observed efficacy between the treatment groups. Using this type of design, the type of genetic and phenotypic relationship display of the efficacy response to a candidate therapeutic intervention is analyzed. For example, a genotypically dominant allelic polymorphic variant or polymorphic variants are those in which both heterozygotes and homozygotes demonstrate a specific phenotypic efficacy response different from the homozygous recessive genotypic group. A pharmacogenetic approach is useful for clinicians and public health professionals to include or eliminate small groups of responders or non-responders from treatment in order to avoid unjustified side-effects. Further, adjustment of dosages when clear clinical difference between heterozygous and homozygous individuals may be beneficial for therapy with the candidate therapeutic intervention.

In some embodiments, a recessive allelic polymorphic variant or polymorphic variants are those in which only the homozygote recessive for that or those polymorphic variants will demonstrate a specific phenotypic efficacy response different from the heterozygotes or homozygous wildtype. In some embodiments, allelic polymorphic variant or polymorphic variants organized by haplotypes from additional gene or genes are included to help explain clinical phenotypic outcome differences among the treatment groups. These types of clinical studies can identify an allelic polymorphic variant and its role in the efficacy or toxicology pattern within the treatment population.

Statistical Analysis of Data

A variety of informative comparisons can be used to identify correlations in the clinical data. In some embodiments, a plurality of pairwise comparisons of treatment response and the presence or absence of at least one polymorphic variant can be performed for a plurality of patients. The response of at least one patient homozygous for at least one polymorphic variant can be compared with at least one patient homozygous for the alternative form of that polymorphic variant or polymorphic variants. The response of at least one patient heterozygous for at least one polymorphic variant can be compared with the response of at least one patient homozygous for the at least one polymorphic variant. The heterozygous patient response can be compared to both alternative homozygous forms, or the response of heterozygous patients is grouped with the response of one class of homozygous patients and said group is compared to the response of the alternative homozygous group.

One approach to analyzing the clinical data is as follows. First, variability between patients in the response to a particular treatment is observed. Second, at least a portion of the variable response is correlated with the presence or absence of at least one polymorphic variant in at least one gene. Third, an analytical or diagnostic test is provided to determine the presence or absence of the at least one polymorphic variant in individual patients. Fourth, the presence or absence of the polymorphic variant or polymorphic variants is used to select a patient for a treatment or to select a treatment for a patient, or the polymorphic variant information is used in other methods described herein.

Polymorphic variants in a gene can be correlated empirically with treatment response, which can be used to identify polymorphic variants in a gene that exist in a population. The presence of the different polymorphic variants or haplotypes in individuals of a study group, which can be representative of a population or populations, is determined. This polymorphic variant information is then correlated with treatment response of the various individuals as an indication that genetic variability in the gene is at least partially responsible for differential treatment response. Statistical measures known to those skilled in the art can be used to measure the fraction of interpatient variation attributable to any one polymorphic variant. Useful methods for identifying genes relevant to the physiologic action of a drug or other treatment are known to those skilled in the art, and include large scale analysis of gene expression in cells treated with the drug compared to control cells, or large scale analysis of the protein expression pattern in treated vs. untreated cells, or the use of techniques for identification of interacting proteins or ligand-protein interactions.

The gene comprising the polymorphic variant can be involved in drug action, and the variant forms of the gene are associated with variability in the action of the drug. In some embodiments, one variant form of the gene is associated with the action of the drug such that the drug will be effective in an individual who is heterozygous or homozygous for the variant. In some embodiments, a variant form of the gene is associated with the action of the drug such that the drug will be toxic or otherwise contra-indicated in a homozygous or heterozygous individual.

In one embodiment, patients are stratified by genotype by one candidate polymorphic variant in the candidate gene locus. Genetic stratification of patients can be accomplished in several ways, including the following, where “X” is the more frequent form of the polymorphic variant being assessed and “x” is the less frequent form): (a) XX vs. xx; (b) XX vs. Xx vs. xx; (c) XX vs. (Xx+xx); (d) (XX+Xx) vs. xx. The effect of genotype on drug response phenotype can be affected by a variety of nongenetic factors, and it can be beneficial to measure the effect of genetic stratification in a subgroup of the overall clinical trial population. Subgroups can be defined in a number of ways including, for example, biological, clinical, pathological or environmental criteria. Biological criteria include sex (gender), age, hormonal status and reproductive history, ethnic, racial, or geographic origin, or surrogate markers of ethnic, racial or geographic origin. Clinical criteria include disease status and disease manifestations. Pathological criteria include histopathologic features of disease tissue, or pathological diagnosis; pathological stage; loss of heterozygosity (LOH), pathology studies, and laboratory studies. Frequency of responders is measured in each genetic subgroup. Subgroups can be defined in several ways: more than two age groups, and age related status such as pre or post-menopausal. One can also stratify by haplotype at one candidate locus where the haplotype is made up of two polymorphic variants, three polymorphic variants or greater than three polymorphic variants. A variety of statistical methods exist for measuring the difference between two or more groups in a clinical trial. One skilled in the art will recognize that different methods are suited to different data sets. In general, there is a family of methods customarily used in clinical trials, and another family of methods customarily used in genetic epidemiological studies. Methods from either family can be suitable for performing statistical analysis of pharmacogenetic clinical trial data.

Conventional clinical trial statistics include hypothesis testing and descriptive methods. Guidance in the selection of appropriate statistical tests for a particular data set can be obtained from texts such as: [Biostatistics: A Foundation for Analysis in the Health Sciences, 7th edition (Wiley Series in Probability and Mathematical Statistics, Applied Probability and statistics) by Wayne W. Daniel, John Wiley & Sons, 1998; Bayesian Methods and Ethics in a Clinical Trial Design (Wiley Series in Probability and Mathematical Statistics. Applied Probability Section) by J. B. Kadane (Editor), John Wiley & Sons, 1996].

Hypothesis testing statistical procedures include the following examples: one-sample procedures (binomial confidence interval, Wilcoxon signed rank test, permutation test with general scores, generation of exact permutational distributions); two-sample procedures (t-test, Wilcoxon-Mann-Whitney test, Normal score test, Median test, Van der Waerden test, Savage test, Logrank test for censored survival data, Wilcoxon-Gehan test for censored survival data, Cochran-Armitage trend test, permutation test with general scores, generation of exact permutational distributions); RxC contingency tables (Fisher's exact test, Pearson's chi-squared test, Likelihood ratio test, Kruskal-Wallis test, Jonckheere-Terpstra test, Linear-by linear association test, McNemar's test, marginal homogeneity test for matched pairs); Stratified 2×2 contingency tables (test of homogeneity for odds ratio, test of unity for the common odds ratio, confidence interval for the common odds ratio); Stratified 2xC contingency tables (all two-sample procedures listed above with stratification, confidence intervals for the odds ratios and trend, generation of exact permutational distributions); General linear models (simple regression, multiple regression, analysis of polymorphic variant—ANOVA—, analysis of copolymorphic variant, response-surface models, weighted regression, polynomial regression, partial correlation, multiple analysis of polymorphic variant—MANOVA—, repeated measures analysis of polymorphic variant); analysis of polymorphic variant and copolymorphic variant with a nested (hierarchical) structure designs and randomized plans for nested and crossed experiments (completely randomized design for two treatment, split-splot design, hierarchical design, incomplete block design, latin square design); nonlinear regression models; logistic regression for unstratified or stratified data, for binary or ordinal response data, using the logit link function, the normit function or the complementary log-log function; probit, logit, ordinal logistic and gompit regression models, fitting parametric models to failure time data that may be right-, left-, or interval-censored; tested distributions can include extreme value, normal and logistic distributions, and, by using a log transformation, exponential, Weibull, lognormal, loglogistic and gamma distributions; compute non-parametric estimates of survival distribution with right-censored data and compute rank tests for association of the response variable with other variables.

Descriptive statistical methods include factor analysis with rotations, canonical correlation, principal component analysis for quantitative variables, principal component analysis for qualitative data, hierarchical and dynamic clustering methods to create tree structure, dendrogram or phenogram, simple and multiple correspondence analysis using a contingency table as input or raw categorical data. Specific instructions and computer programs for performing the above calculations can be obtained from companies such as: SAS/STAT Software, SAS Institute Inc., Cary, N.C., USA; BMDP Statistical Software, BMDP Statistical Software Inc., Los Angeles, Calif., USA; SYSTAT software, SPSS Inc., Chicago, Ill., USA; StatXact & LogXact, CYTEL Software Corporation, Cambridge, Mass., USA.

Genetic epidemiological methods can also be useful in carrying out statistical tests for the invention. Guidance in the selection of appropriate genetic statistical tests for analysis of a particular data set can be obtained from texts such as: [Fundamentals of Genetic Epidemiology (Monographs in Epidemiology and Biostatistics, Vol. 22) by M. J. Khoury, B. H. Cohen & T. H. Beaty, Oxford Univ. Press, 1993; Methods in Genetic Epidemiology by Newton E. Morton, S. Karger Publishing, 1983; Methods in Observational Epidemiology, 2nd edition (Monographs in Epidemiology and Biostatistics, V. 26) by J. L. Kelsey (Editor), A. S. Whittemore & A. S. Evans, 1996; Clinical Trials: Design, Conduct, and Analysis (Monographs in Epidemiology and Biostatistics, Vol 8) by C. L. Meinert & S. Tonascia, 1986)].

Parsimony methods can be used to classify DNA sequences, haplotypes or phenotypic characters. Parsimony principle maintains that the best explanation for the observed differences among sequences, phenotypes (individuals, species) etc., is provided by the smallest number of evolutionary changes. Alternatively, simpler hypotheses are used to explain a set of data or patterns, than more complicated ones [Molecular Systematics, Hillis et al. (1996)]. These methods for inferring relationship among sequences operate by minimizing the number of evolutionary steps or mutations, changes from one sequence/character, required to explain a given set of data. To obtain relationships among a set of sequences and construct a structure, such as a tree or topology, the minimum number of mutations that are required for explaining the observed evolutionary changes among a set of sequences are first counted. A structure is constructed based on this number. Additional structures are tried and the structure that requires the smallest number of mutational steps is chosen as the likely structure/evolutionary tree for the sequences studied.

If the computed frequency of the polymorphic variants and/or haplotypes is equal to the number of individuals in the population, then there will be a consideration of utilizing additional methods. For these cases and if there is a small population, then the number of haplotypes will be considered relative to the number of entrants. Homozygotes can be assigned one unambiguous haplotype. If there is a single site polymorphic variant (mutation) at one of the chromosomes then it will have two haplotypes. As the number of polymorphic variants increase in the diploid chromosomes, each of these polymorphic variants are compared with the haplotypes of the original population. Then a frequency is assigned to the new polymorphic variant based upon the Hardy-Weinberg expected frequencies. [See generally, Clark, Mol. Biol. and Evol. (1990).]

The statistical significance of the differences between polymorphic variant frequencies can be assessed by a Pearson chi-squared test of homogeneity of proportions with n−1 degrees of freedom. Then, in order to determine which polymorphic variant(s) is responsible for an eventual significance, one can consider each polymorphic variant individually against the rest, up to n comparisons, each based on a 2×2 table. This approach should result in chi-sequared tests that are individually valid; taking the most significant of these tests is a form of multiple testing. A Bonferroni's adjustment for multiple testing can be made to the P-values, such as p*=1−(1−p)_(n). The statistical significance of the difference between genotype frequencies associated to every polymorphic variant can be assessed by a Pearson chi-squared test of homogeneity of proportions with 2 degrees of freedom, using the same Bonferroni's adjustment as above.

Testing for unequal haplotype frequencies between cases and controls can be considered in the same framework as testing for unequal polymorphic variant frequencies, because a single polymorphic variant can be considered as a haplotype of a single locus. The relevant likelihood ratio test compares a model where two separate sets of haplotype frequencies apply to the cases and controls, to one where the entire sample is characterized by a single common set of haplotype frequencies. This comparison can be performed by repeated use of a computer program [Terwilliger and Ott, 1994, Handbook of Human Linkage Analysis, Baltimore, John Hopkins University Press] to successively obtain the log-likelihood corresponding to the set of haplotpe frequency estimates on the cases (lnL case), on the controls (lnLcontrol), and on the overall (lnLcombined). The test statistic 2((lnLcase)+(lnLcontrol)−(lnLcombined)) is then chi-squared with degrees of freedom, where r is the number of haplotypes. To test for potentially confounding effects or effect-modifiers, such as sex, age, etc., logistic regression can be used with case-control status as the outcome variable, and genotypes and covariates, plus possible interactions, as predictor variables.

Drug Screening

Drug screening assays can be performed on cells that have been transfected with a nucleic acid encoding all or part of one of the polymorphic variants relevant to the invention. In some embodiments, no endogenous equivalents of transfected nucleic acids are present in the cells. The cells can be transfected with RNA in which case expression of the polymorphic variant protein is transient. Alternatively, the nucleic acid can be stably introduced into the cell line. In those embodiments wherein the nucleic acid encodes a one carbon metabolic pathway enzyme, cells expressing protein are monitored for relative levels of pathway molecules. The control can be vehicle without an agent or can be an agent known not to have any effect on a one carbon metabolic pathway. The control can be a known agonist and/or antagonist of a one carbon metabolic pathway. Transfected cells are also useful for identifying genes whose expression pattern is altered in the presence of one or more of the polymorphic variants relevant to the invention relative to wildtype form. Such genes themselves are potential therapeutic or diagnostic targets.

In some embodiments, drug screening assays are performed on transgenic animals. Some transgenic animals have an exogenous human transgene bearing a polymorphic variant relevant to the invention. In some such animals, the endogenous equivalent(s) of transfected gene(s) transgene is/are knocked out. In other transgenic animals, the endogenous gene is mutated to contain one of the variant forms relevant to the invention. Potential agents are administered to the transgenic animal, and relevant parameters are measured. The performance can be compared with that of a transgenic animal administered a control substance or with a nontransgenic animal administered the agent or a control substance.

The invention provides a pharmaceutical composition that includes a compound that has a differential effect in patients having at least one copy, or alternatively, two copies of a form of a gene as identified for aspects above and a pharmaceutically acceptable carrier, excipient, or diluent. The composition is adapted to be preferentially effective to treat a patient with cells containing one, two, or more copies of the form of the gene.

The methods and materials of the invention can utilize conventional pharmaceutical compositions more effectively by identifying patients who are likely to benefit from a particular treatment, patients for whom a particular treatment is less likely to be effective, or for whom a particular treatment is likely to produce undesirable or intolerable effects. In some embodiments, compositions are adapted to be preferentially effective in patients who possess particular genetic characteristics, i.e., in whom a particular polymorphic variant or polymorphic variants in one or more genes is present or absent—depending on whether the presence or the absence of the polymorphic variant or polymorphic variants in a patient is correlated with an increased expectation of beneficial response. In some embodiments, one or more polymorphic variants indicates that a patient can beneficially receive a significantly higher dosage of a drug than a patient having a different polymorphic variant or polymorphic variants. An indication or suggestion can specify that a patient be heterozygous, or alternatively, homozygous for a particular polymorphic variant or polymorphic variants or variant form of a gene. In some embodiments, an indication or suggestion specifies that a patient have no more than one copy, or zero copies, of a particular polymorphic variant, polymorphic variants, or variant form of a gene.

In some embodiments involving pharmaceutical compositions, active compounds, or drugs, the material is subject to a regulatory limitation or restriction on approved uses or indications, e.g., by the U.S. Food and Drug Administration (FDA), limiting approved use of the composition to patients having at least one copy of the particular form of the gene that contains at least one polymorphic variant. In some embodiments, the composition is subject to a regulatory limitation or restriction on approved uses indicating that the composition is not approved for use or should not be used in patients having at least one copy of a form of the gene including at least one polymorphic variant. In some embodiments, the composition is packaged, and the packaging includes a label or insert indicating or suggesting beneficial therapeutic approved use of the composition in patients having one or two copies of a form of the gene including at least one polymorphic variant. Alternatively, the label or insert limits approved use of the composition to patients having zero or one or two copies of a form of the gene including at least one polymorphic variant. The latter embodiment would be likely where the presence of the at least one polymorphic variant in one or two copies in cells of a patient means that the composition would be ineffective or deleterious to the patient. In some embodiments, the composition is indicated for use in treatment of a disease or condition which is one of those identified for aspects above. In some embodiments, the at least one polymorphic variant includes at least one polymorphic variant from those identified herein.

The term “packaged” means that the drug, compound, or composition is prepared in a manner suitable for distribution or shipping with a box, vial, pouch, bubble pack, or other protective container, which may also be used in combination. The packaging can have printing on it and/or printed material may be included in the packaging. In some embodiments, the drug is subject to a regulatory limitation or suggestion or warning as described above that limits or suggests limiting approved use to patients having specific polymorphic variants or variant forms of a gene in order to achieve maximal benefit and avoid toxicity or other deleterious effect.

A pharmaceutical composition can be adapted to be preferentially effective in a variety of ways. In some embodiments, an active compound is selected that was not previously known to be differentially active, or which was not previously recognized as a potential therapeutic compound. In some embodiments, the concentration of an active compound that has differential activity can be adjusted such that the composition is appropriate for administration to a patient with the specified polymorphic variants. In some embodiments, the presence of a specified polymorphic variant may allow or require the administration of a much larger dose, which would not be practical with a previously utilized composition. In some embodiments, a patient requires a much lower dose, such that administration of such a dose with a prior composition would be impractical or inaccurate. The composition can be prepared in a higher or lower unit dose form, or prepared in a higher or lower concentration of the active compound or compounds. In yet other cases, the composition can include additional compounds needed to enable administration of a particular active compound in a patient with the specified polymorphic variants, which was not in previous compositions, for example, because the majority of patients did not require or benefit from the added component.

In some embodiments, a drug is explicitly indicated for, and/or for which approved use is restricted to individuals in the population with specific polymorphic variants or combinations of polymorphic variants, as determined by diagnostic tests for polymorphic variants or variant forms of certain genes involved in the disease or condition or involved in the action of the drug. Such drugs can provide more effective treatment for a disease or condition in a population identified or characterized with the use of a diagnostic test for a specific polymorphic variant or variant form of the gene if the gene is involved in the action of the drug or in determining a characteristic of the disease or condition. Such drugs can be developed using the diagnostic tests for specific polymorphic variants or variant forms of a gene to determine the inclusion of patients in a clinical trial.

The invention also comprises a method for producing a pharmaceutical composition by identifying a compound that has differential activity against a disease or condition in patients having at least one polymorphic variant in a gene, compounding the pharmaceutical composition by combining the compound with a pharmaceutically acceptable carrier, excipient, or diluent such that the composition is preferentially effective in patients who have at least one copy of the polymorphic variant or polymorphic variants. In some embodiments, the patient has two copies of the polymorphic variant or polymorphic variants. In some embodiments, the disease or condition, gene or genes, polymorphic variants, methods of administration, or method of determining the presence or absence of polymorphic variants is as described for other aspects of this invention.

The invention also comprises a method for producing a pharmaceutical agent by identifying a compound which has differential activity against a disease or condition in patients having at least one copy of a form of a gene having at least one polymorphic variant and synthesizing the compound in an amount sufficient to provide a pharmaceutical effect in a patient suffering from the disease or condition. The compound can be identified by conventional screening methods and its activity confirmed. Compound libraries can be screened to identify compounds which differentially bind to products of variant forms of a particular gene product, or which differentially affect expression of variant forms of the particular gene, or which differentially affect the activity of a product expressed from such gene.

The invention also includes methods of manufacturing a medicament comprising one or more of the materials of the invention in the treatment of one or more of the diseases of the invention. Therapeutic agents and regimens further include homocysteine monitoring, B vitamin supplementation, for example, folate, FOLTX®, B₁₂, and chemotherapeutic agents. Each FOLTX® tablet contains 2.5 mg of folacin (folic acid), 25 mg of pyridoxine (Vitamin B₆), and 2 mg of cyanocobalamin (Vitamin B₁₂).

Formulation

A therapeutic agent, which can be a compound and/or a composition, relevant to the invention can comprise a small molecule, a nucleic acid, a protein, an antibody, or any other agent with one or more therapeutic property. The therapeutic agent can be formulated in any pharmaceutically acceptable manner. In some embodiments, the therapeutic agent is prepared in a depot form to allow for release into the body to which it is administered is controlled with respect to time and location within the body (see, for example, U.S. Pat. No. 4,450,150). Depot forms of therapeutic agents can be, for example, an implantable composition comprising the therapeutic agent and a porous or non-porous material, such as a polymer, wherein the therapeutic agent is encapsulated by or diffused throughout the material and/or degradation of the non-porous material. The depot is then implanted into the desired location within the body and the therapeutic agent is released from the implant at a predetermined rate.

The therapeutic agent that is used in the invention can be formed as a composition, such as a pharmaceutical composition comprising a carrier and a therapeutic compound. Pharmaceutical compositions containing the therapeutic agent can comprise more than one therapeutic agent. The pharmaceutical composition can alternatively comprise a therapeutic agent in combination with other pharmaceutically active agents or drugs, such as chemotherapeutic agents, for example, a cancer drug.

The carrier can be any suitable carrier. Preferably, the carrier is a pharmaceutically acceptable carrier. With respect to pharmaceutical compositions, the carrier can be any of those conventionally used and is limited only by chemico physical considerations, such as solubility and lack of reactivity with the active compound(s), and by the route of administration. In addition to the following described pharmaceutical composition, the therapeutic compounds of the present inventive methods can be formulated as inclusion complexes, such as cyclodextrin inclusion complexes, or liposomes.

The pharmaceutically acceptable carriers described herein, for example, vehicles, adjuvants, excipients, and diluents, are well-known to those skilled in the art and are readily available to the public. The pharmaceutically acceptable carrier can be chemically inert to the active agent(s) and one which has no detrimental side effects or toxicity under the conditions of use. The choice of carrier can be determined in part by the particular therapeutic agent, as well as by the particular method used to administer the therapeutic compound. There are a variety of suitable formulations of the pharmaceutical composition of the invention. The following formulations for oral, aerosol, parenteral, subcutaneous, transdermal, transmucosal, intestinal, parenteral, intramedullary injections, direct intraventricular, intravenous, intranasal, intraocular, intramuscular, intraarterial, intrathecal, interperitoneal, rectal, and vaginal administration are exemplary and are in no way limiting. More than one route can be used to administer the therapeutic agent, and in some instances, a particular route can provide a more immediate and more effective response than another route. Depending on the specific conditions being treated, such agents can be formulated and administered systemically or locally. Techniques for formulation and administration may be found in [Remington's Pharmaceutical Sciences, 18th ed., Mack Publishing Co., Easton, Pa. (1990)].

Formulations suitable for oral administration can include (a) liquid solutions, such as an effective amount of the inhibitor dissolved in diluents, such as water, saline, or orange juice; (b) capsules, sachets, tablets, lozenges, and troches, each containing a predetermined amount of the active ingredient, as solids or granules; (c) powders; (d) suspensions in an appropriate liquid; and (e) suitable emulsions. Liquid formulations may include diluents, such as water and alcohols, for example, ethanol, benzyl alcohol, and the polyethylene alcohols, either with or without the addition of a pharmaceutically acceptable surfactant. Capsule forms can be of the ordinary hard or soft shelled gelatin type containing, for example, surfactants, lubricants, and inert fillers, such as lactose, sucrose, calcium phosphate, and corn starch. Tablet forms can include one or more of lactose, sucrose, mannitol, corn starch, potato starch, alginic acid, microcrystalline cellulose, acacia, gelatin, guar gum, colloidal silicon dioxide, croscarmellose sodium, talc, magnesium stearate, calcium stearate, zinc stearate, stearic acid, and other excipients, colorants, diluents, buffering agents, disintegrating agents, moistening agents, preservatives, flavoring agents, and other pharmacologically compatible excipients. Lozenge forms can comprise the inhibitor in a flavor, usually sucrose and acacia or tragacanth, as well as pastilles comprising the inhibitor in an inert base, such as gelatin and glycerin, or sucrose and acacia, emulsions, gels, and the like containing, in addition to, such excipients as are known in the art.

Pharmaceutical preparations that can be used orally include push-fit capsules made of gelatin, as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or sorbitol. The push-fit capsules can contain the active ingredients in admixture with filler such as lactose, binders such as starches, and/or lubricants such as talc or magnesium stearate and, optionally, stabilizers. In soft capsules, the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycols. In addition, stabilizers may be added.

The therapeutic agent, alone or in combination with other suitable components, can be made into aerosol formulations to be administered via inhalation. These aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen, and the like. They also can be formulated as pharmaceuticals for non pressured preparations, such as in a nebulizer or an atomizer. Such spray formulations also may be used to spray mucosa. Topical formulations are well known to those of skill in the art. Such formulations are particularly suitable in the context of the invention for application to the skin.

Injectable formulations are in accordance with the invention. The parameters for effective pharmaceutical carriers for injectable compositions are well-known to those of ordinary skill in the art [see, e.g., Pharmaceutics and Pharmacy Practice, J.B. Lippincott Company, Philadelphia, Pa., Banker and Chalmers, eds., pages 238-250 (1982), and ASHP Handbook on Injectable Drugs, Toissel, 4th ed., pages 622-630 (1986)]. For injection, the agents of the invention can be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hanks's solution, Ringer's solution, or physiological saline buffer. For such transmucosal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art.

Formulations suitable for parenteral administration include aqueous and non aqueous, isotonic sterile injection solutions, which can contain anti oxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and non aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives. The therapeutic agent can be administered in a physiologically acceptable diluent in a pharmaceutical carrier, such as a sterile liquid or mixture of liquids, including water, saline, aqueous dextrose and related sugar solutions, an alcohol, such as ethanol or hexadecyl alcohol, a glycol, such as propylene glycol or polyethylene glycol, dimethylsulfoxide, glycerol, ketals such as 2,2-dimethyl-1,3-dioxolane-4-methanol, ethers, poly(ethyleneglycol) 400, oils, fatty acids, fatty acid esters or glycerides, or acetylated fatty acid glycerides with or without the addition of a pharmaceutically acceptable surfactant, such as a soap or a detergent, suspending agent, such as pectin, carbomers, methylcellulose, hydroxypropylmethylcellulose, or carboxymethylcellulose, or emulsifying agents and other pharmaceutical adjuvants.

Oils, which can be used in parenteral formulations include petroleum, animal, vegetable, or synthetic oils. Specific examples of oils include peanut, soybean, sesame, cottonseed, corn, olive, petrolatum, and mineral. Suitable fatty acids for use in parenteral formulations include oleic acid, stearic acid, and isostearic acid. Ethyl oleate and isopropyl myristate are examples of suitable fatty acid esters.

Suitable soaps for use in parenteral formulations include fatty alkali metal, ammonium, and triethanolamine salts, and suitable detergents include (a) cationic detergents such as, for example, dimethyl dialkyl ammonium halides, and alkyl pyridinium halides, (b) anionic detergents such as, for example, alkyl, aryl, and olefin sulfonates, alkyl, olefin, ether, and monoglyceride sulfates, and sulfosuccinates, (c) nonionic detergents such as, for example, fatty amine oxides, fatty acid alkanolamides, and polyoxyethylenepolypropylene copolymers, (d) amphoteric detergents such as, for example, alkyl-β-aminopropionates, and 2-alkyl-imidazoline quaternary ammonium salts, and (e) mixtures thereof.

The parenteral formulations will typically contain from about 0.5% to about 25% by weight of the inhibitor in solution. Preservatives and buffers may be used. In order to minimize or eliminate irritation at the site of injection, such compositions may contain one or more nonionic surfactants having a hydrophile-lipophile balance (HLB) of from about 12 to about 17. The quantity of surfactant in such formulations will typically range from about 5% to about 15% by weight. Suitable surfactants include polyethylene glycol sorbitan fatty acid esters, such as sorbitan monooleate and the high molecular weight adducts of ethylene oxide with a hydrophobic base, formed by the condensation of propylene oxide with propylene glycol. The parenteral formulations can be presented in unit-dose or multi-dose sealed containers, such as ampoules and vials, and can be stored in a freeze-dried (lyophilized) condition requiring only the addition of the sterile liquid excipient, for example, water, for injections, immediately prior to use. Extemporaneous injection solutions and suspensions can be prepared from sterile powders, granules, and tablets of the kind previously described.

The therapeutic agent can be made into suppositories by mixing with a variety of bases, such as emulsifying bases or water-soluble bases. Formulations suitable for vaginal administration can be presented as pessaries, tampons, creams, gels, pastes, foams, or spray formulas containing, in addition to the active ingredient, such carriers as are known in the art to be appropriate.

The exact formulation, route of administration and dosage can be chosen by the individual physician in view of the patient's condition. [See, e.g., Fingl et. al., in The Pharmacological Basis of Therapeutics, 1975, Ch. 1, p. 1]. The attending physician can determine when to terminate, interrupt, or adjust administration due to toxicity, or to organ dysfunctions. Conversely, the attending physician can also adjust treatment to higher levels if the clinical response were not adequate, precluding toxicity. The magnitude of an administrated dose in the management of disorder of interest will vary with the severity of the condition to be treated and the route of administration. The severity of the condition may, for example, be evaluated, in part, by standard prognostic evaluation methods. The dose and perhaps dose frequency, can vary according to the age, body weight, and response of the individual patient. A program comparable to that discussed above can be used in veterinary medicine.

Use of pharmaceutically acceptable carriers to formulate the compounds herein disclosed for the practice of the invention into dosages suitable for systemic administration is within the scope of the invention. With proper choice of carrier and suitable manufacturing practice, the compositions relevant to the invention, in particular, those formulated as solutions, can be administered parenterally, such as by intravenous injection. The compounds can be formulated readily using pharmaceutically acceptable carriers well known in the art into dosages suitable for oral administration. Such carriers enable the compounds relevant to the invention to be formulated as tablets, pills, capsules, liquids, gels, syrups, slurries, tablets, dragees, solutions, suspensions and the like, for oral ingestion by a patient to be treated.

Agents intended to be administered intracellularly may be administered using techniques well known to those of ordinary skill in the art. For example, such agents may be encapsulated into liposomes, then administered as described above. Liposomes are spherical lipid bilayers with aqueous interiors. All molecules present in an aqueous solution at the time of liposome formation are incorporated into the aqueous interior. The liposomal contents are both protected from the external microenvironment and, because liposomes fuse with cell membranes, are efficiently delivered into the cell cytoplasm. Additionally, due to their hydrophobicity, small organic molecules may be directly administered intracellularly.

Pharmaceutical compositions suitable for use in the invention include compositions wherein the active ingredients are contained in an effective amount to achieve its intended purpose. In addition to the active ingredients, these pharmaceutical compositions can contain suitable pharmaceutically acceptable carriers comprising excipients and auxiliaries which facilitate processing of the active compounds into preparations which can be used pharmaceutically. The pharmaceutical compositions relevant to the invention can be manufactured in a manner that is itself known, for example, mixing, dissolving, granulating, dragee-making, levitating, emulsifying, encapsulating, entrapping or lyophilizing processes.

Pharmaceutical formulations for parenteral administration include aqueous solutions of the active compounds in water-soluble form. Additionally, suspensions of the active compounds can be prepared as appropriate oily injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Aqueous injection suspensions can contain substances which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol, or dextran. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility of the compounds to allow for the preparation of highly concentrated solutions.

Pharmaceutical preparations for oral use can be obtained by combining the active compounds with solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable excipients are, in particular, fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; cellulose preparations such as, for example, maize starch, wheat starch, rice starch, potato starch, gelatin, gum tragacanth, methyl cellulose, hydroxypropylmethyl-cellulose, sodium carboxymethylcellulose, and/or polyvinylpyrrolidone (PVP). If desired, disintegrating agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, or alginic acid or a salt thereof such as sodium alginate. Dragee cores are provided with suitable coatings. For this purpose, concentrated sugar solutions can be used, which may optionally contain gum arabic, talc, polyvinyl pyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for identification or to characterize different combinations of active compound doses.

Administration

The invention also provides selecting a method of administration of an agent to a patient suffering from a disease or condition, by determining the presence or absence of at least one polymorphic variant in cells of the patient, where such presence or absence is indicative of an appropriate method of administration of the agent. The selection of a treatment regimen can involve selecting a dosage level or frequency of administration or route of administration of the agent(s) or combinations of those parameters. In some embodiments, two or more agents are administered, and the selecting involves selecting a method of administration for one, two, or more than two of the agents, jointly, concurrently, or separately. As understood by those skilled in the art, such plurality of agents is often used in combination therapy, and thus may be formulated in a single drug, or may be separate drugs administered concurrently, serially, or separately. Other embodiments are as indicated above for selection of second treatment methods, methods of identifying polymorphic variants, and methods of treatment as described for aspects above. The frequency of administration is generally selected to achieve a pharmacologically effective average or peak serum level without excessive deleterious effects. In some embodiments, the serum level of the drug is maintained within a therapeutic window of concentrations for the greatest percentage of time possible without such deleterious effects as would cause a prudent physician to reduce the frequency of administration for a particular dosage level. Administration of a particular treatment, for example, administration of a therapeutic compound or combination of compounds, is chosen depending on the disease or condition which is to be treated. In some embodiments, the disease or condition is one for which administration of a treatment is expected to provide a therapeutic benefit. In embodiments involving selection of a patient for a treatment, selection of a method or mode of administration of a treatment, and selection of a patient for a treatment or a method of treatment, the selection can be positive selection or negative selection. The methods can include modifying or eliminating a treatment for a patient, modifying or eliminating a method or mode of administration of a treatment to a patient, or modification or elimination of a patient for a treatment or method of treatment. A patient can be selected for a method of administration of a treatment, by detecting the presence or absence of at least one polymorphic variant in a gene as identified herein, where the presence or absence of the at least one polymorphic variant is indicative that the treatment or method of administration will be effective in the patient. If the at least one polymorphic variant is present in the patient's cells, then the patient is selected for administration of the treatment.

Dosage

The term “drug” or “therapeutic agent” as used herein refers to a chemical entity or biological product, or combination of chemical entities or biological products, administered to a person to treat, or prevent or control a disease or condition. In some embodiments, the chemical entity or biological product is a low molecular weight compound. A “low molecular weight compound” has a molecular weight<5,000 Da, <2500 Da, <1000 Da, or <700 Da. In some embodiments, the chemical entity is a larger compound, for example, an oligomer of nucleic acids, amino acids, or carbohydrates including without limitation proteins, oligonucleotides, ribozymes, DNAzymes, glycoproteins, lipoproteins, and modifications and combinations thereof. In some embodiments, the biological product is a monoclonal or polyclonal antibody or fragment thereof such as a variable chain fragment cells; or an agent or product arising from recombinant technology, such as, without limitation, a recombinant protein, recombinant vaccine, or DNA construct developed for therapeutic use. The term “drug” or “therapeutic agent” can include, without limitation, compounds that are approved for sale as pharmaceutical products by government regulatory agencies such as the U.S. Food and Drug Administration (USFDA or FDA), the European Medicines Evaluation Agency (EMEA), and a world regulatory body governing the Internation Conference of Harmonization (ICH) rules and guidelines, compounds that do not require approval by government regulatory agencies, food additives or supplements including compounds commonly characterized as vitamins, natural products, and completely or incompletely characterized mixtures of chemical entities including natural compounds or purified or partially purified natural products. In some embodiments, the drug is approved by a government agency for treatment of a specific disease or condition. The term “drug” as used herein is synonymous with the terms “agent,” “therapeutic agent,” “compound,” “therapeutic compound,” “composition,” “therapeutic composition,” “medicine,” “pharmaceutical product,” or “product.”

In treating a patient exhibiting a disorder of interest, a therapeutically effective amount of a agent or agents is administered. A therapeutically effective dose refers to that amount of the compound that results in amelioration of one or more symptoms or a prolongation of survival in a patient. The amount or dose of the therapeutic compound administered should be sufficient to affect a therapeutic response in the subject or animal over a reasonable time frame. For example, in the case of cancer, the dose of the therapeutic compound should be sufficient to inhibit metastasis, prevent metastasis, treat or prevent cancer in a period of from about 2 hours or longer, e.g., 12 to 24 or more hours, from the time of administration. In certain embodiments, the time period could be even longer. The dose can be determined by the efficacy of the particular therapeutic agent and the condition of the subject, as well as the body weight of the subject to be treated. Many assays for determining an administered dose are known in the art.

The dose of the therapeutic compound can also be determined by the existence, nature and extent of any adverse side effects that might accompany the administration of a particular therapeutic compound. The attending physician can decide the dosage of the inhibitor relevant to the invention with which to treat each individual patient using the correlation between polymorphic variant and disease and/or drug efficacies provided by the invention and taking into consideration a variety of factors, such as age, body weight, general health, diet, sex, inhibitor to be administered, route of administration, and the severity of the condition being treated. In some embodiments, the dose of the therapeutic compound is about 0.001 to about 1000 mg/kg body weight of the subject being treated/day, from about 0.01 to about 10 mg/kg body weight/day, about 0.01 mg to about 1 mg/kg body weight/day.

Toxicity and therapeutic efficacy of therapeutic agents can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, for example, for determining the LD₅₀ and the ED₅₀. The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD₅₀/ED₅₀. In some embodiments, compounds that exhibit large therapeutic indices are used. The data obtained from these cell culture assays and animal studies can be used in formulating a range of dosage for use in humans. The dosage of such compounds can lie within a range of circulating concentrations that can include the ED₅₀ with little or no toxicity. The dosage can vary within this range depending upon the dosage form and route of administration utilized. The therapeutically effective dose can be estimated initially from cell culture assays. For example, a dose can be formulated in animal models to achieve a circulating plasma concentration range that includes the IC₅₀ as determined in cell culture. Such information can be used to more accurately determine useful doses in humans. Levels in plasma may be measured, for example, by HPLC.

In connection with the administration of a drug, a drug which is “effective against” a disease or condition indicates that administration in a clinically appropriate manner results in a beneficial effect for at least a statistically significant fraction of patients, such as a improvement of symptoms, a cure, a reduction in disease load, reduction in tumor mass or cell numbers, extension of life, improvement in quality of life, or other effect generally recognized as positive by those of skill in the art.

In some embodiments, dosage is in respect to B vitamins administered as part of a therapy for a pregnancy-related complication. The following are dietary reference intakes (DRIs, per diem) for exemplary B vitamins. While in some embodiments, a subject is administered a dose about equal to that of DRI, generally the subject is administered one or more vitamin in doses greater than that of the DRI. Vitamin B₂ (riboflavin), DRI of 1.1 milligrams; Vitamin B₆ (pyridoxine), DRI of 1.3 milligrams; Vitamin B₉ (folic acid, folate, pteroylglutamic acid), DRI of 400 micrograms; and Vitamin B₁₂ (cyano-cobalamin) DRI of 2.4 micrograms. Analogous, pro-drug, salts, and bioactive equivalents of these vitamins can also be employed. For example other folate-related compounds include folinic acid (5-formyl-tetrahydropteroylglutamate), and other B₁₂-related compounds include methylcobalamin, hydroxycobalamin, and adenosylcobalamin (5′-deoxyadenosylcobalamin, dibencozide).

Kits

The invention includes kits for the detection of polymorphic variants associated with disease states, conditions or complications. The kits can comprise a polynucleotide of at least 30 contiguous nucleotides of one of the variants described herein. In one embodiment, the polynucleotide contains at least one polymorphism of the invention. Alternatively, the 3′ end of the polynucleotide is immediately 5′ to a polymorphic site, preferably a polymorphic site of the invention. In one embodiment, the polymorphic site contains a genetic variant. In still another embodiment, the genetic variant is located at the 3′ end of the polynucleotide. In yet another embodiment, the polynucleotide of the kit contains a detectable label. Suitable labels include, but are not limited to, radioactive labels, such as radionuclides, fluorophores or fluorochromes, peptides, enzymes, antigens, antibodies, vitamins or steroids. The kit may also contain additional materials for detection of the polymorphisms. A kit can contain one or more of the following: buffer solutions, enzymes, nucleotide triphosphates, and other reagents and materials useful for the detection of genetic polymorphisms. Kits can contain instructions for conducting analyses of samples for the presence of polymorphisms and for interpreting the results obtained.

In some embodiments, the kit contains one or more pairs of allele-specific oligonucleotides hybridizing to different forms of a polymorphism. In some embodiments, the kit contains at least one probe or at least one primer or both corresponding to a gene or genes relevant to the invention. The kit can be adapted and configured to be suitable for identification of the presence or absence of one or more polymorphic variants. The kit can contain a plurality of either or both of such probes and/or primers, for example, 2, 3, 4, 5, 6, or more of such probes and/or primers. The plurality of probes and/or primers are adapted to provide detection of a plurality of different sequence polymorphic variants in a gene or plurality of genes, for example, in 2, 3, 4, 5, or more genes or to sequence a nucleic acid sequence including at least one polymorphic variant site in a gene or genes. In some embodiments, the kit contains components for detection of a plurality of polymorphic variants indicative of the effectiveness of a treatment or treatment against a plurality of diseases. Additional kit components can include one or more of the following: a buffer or buffers, such as amplification buffers and hybridization buffers, which may be in liquid or dry form, a DNA polymerase, such as a polymerase suitable for carrying out PCR, and deoxy nucleotide triphosphases (dNTPs). Preferably a probe includes a detectable label, for example, a fluorescent label, enzyme label, light scattering label, or other label. Additional components of the kit can also include restriction enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates, means used to label, for example, an avidin-enzyme conjugate and enzyme substrate and chromogen if the label is biotin, and the appropriate buffers for reverse transcription, PCR, or hybridization reactions.

In some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. For example, the same substrate can comprise allele-specific oligonucleotide probes for detecting any or all of the polymorphism variants described herein. Accordingly, the kit may comprise an array including a nucleic acid array and/or a polypeptide array. The array can include a plurality of different antibodies, a plurality of different nucleic acid sequences. Sites in the array can allow capture and/or detection of nucleic acid sequences or gene products corresponding to different polymorphic variants in one or more different genes. The array can be arranged to provide polymorphic variant detection for a plurality of polymorphic variants in one or more genes which correlate with the effectiveness of one or more treatments of one or more diseases.

The kit also can contain instructions for carrying out the methods. In some embodiments, the instructions include a listing of the polymorphic variants correlating with a particular treatment or treatments for a disease of diseases. The kit components can be selected to allow detection of a polymorphic variant described herein, and/or detection of a polymorphic variant indicative of a treatment, for example, administration of a drug.

The following examples further illustrate the invention but, of course, should not be construed as in any way limiting its scope.

Example 1

Candidate polymorphic variant sites in folate/homocysteine-related genes were investigated for a potential maternal association with increased risk of developing clinically severe abruptio placentae during pregnancy. [Parle-McDermott et al., Am. J. Med. Genetics, 132A:365-368 (2005).] The polymorphic variants tested included MTHFD1 1958G>A (R653Q), which had not been tested previously in relation to abruptio placentae, and the MTHFR polymorphisms 677C>T (A222V) and 1298A>C (E429A).

Blood samples were obtained from 56,049 pregnant women attending the three main maternity hospitals in Dublin between 1986 and 1990. Samples were taken on the first visit to the clinic and the gestational ages ranged between 15 and 17 weeks. Approximately 90% of births in the Dublin area are delivered in these hospitals as previously described [Kirke et al., Obstget. Gynecol., 89:221-226 (1993)]. In a global context, the Irish can be described as a Caucasian Northern European population [Cavalli-Sforza, Princeton, N.J.: Princeton University Press (1993)]. Morever, the low level of immigration into Ireland during the last century, means that population stratification is unlikely to confound genetic analyses. Pregnancies affected by severe abruptio placentae (n=62) and control pregnancies (n=184) were identified from hospital records in two of the hospitals. The diagnosis of severe abruptio placentae was based on having a retroplacental clot and/or accidental hemorrhage with associated clinical signs of abruption and/or a statement in the case records that the patient was a definite case of abruptio placentae. Data on gestational age at delivery, maternal hypertension, maternal blood transfusion, and pregnancy outcome were collected on all cases. Control pregnancies were selected from women with no history of abruptio placentae, and were matched for the same date and clinic as the cases where the blood sample was provided. Ethical approval was obtained for all samples collected and samples were anonymised prior to genotyping.

Genomic DNA was extracted using QIAamp DNA Blood Mini Kit (Qiagen, UK). Analysis of the MTHFD1 1958G>A (R653Q) polymorphism was performed by PCR-RFLP (restriction fragment length polymorphism) as detailed previously [Brody et al., Am. J. Hum. Genet., 71:1207-1215 (2002)]. Analysis of the MTHFR 677C>T (A222V) polymorphism was performed by PCR-RFLP using Hinf I as previously described [Frosst et al., Nature Genet., 10:111:113 (1995)]. The MTHFR 1298A>C (E429A) polymorphism was PCR amplified as described in van der Put et al., Am. J. Hum. Genet., 62:1044-1051 (1998) and genotyping was carried out via ASO (allele specific oligonucleotide) analysis as described previously [Parle-McDermott et al., J. Hum. Genet., 48:190-193 (2003)].

Allele and genotype frequencies were compared between cases and controls using statistical software (SAS PROC NLMIXED). The odds ratios were calculated using a log linear model by the delta method [Agresti, N.Y.: John Wiley & Sons (1990)] and statistical significance was assessed via the chi-square test. Likelihood ratios (G2) were used to assess goodness of fit of different models i.e., G2 provides a measure of the reliability of the odds ratio (small G2 P-values indicate a poor fit to the model being tested).

Combined MTHFR genotypes were analyzed by estimating (maximum likelihood estimation) the gamete frequencies in cases and controls using a model of the four combinations of alleles as described by Weir, Genetic Data Analysis II, Sunderland, Mass.: Sinauer (1996). A gene-gene interactive effect of MTHFR 677C>T (A222V) with MTHFD1 1958G>A (R653Q) or MTHFR 1298A>C (E429A) was tested using a series of non-hierarchical logistic models [Piegorsch et al., Stat. Med., 13:153-162 (1994)] to estimate interactive dominant and recessive effects.

The case group (n=62) consisted of mothers whose pregnancies were affected by clinically severe abruptio placentae. As expected in severe cases of abruption placentae, there was considerable co-morbidity. Intrauterine fetal death occurred in 31 of 62 (50%) cases. Blood transfusion was required in 26 of 62 (42%) cases. Maternal antepartum hypertension pre-abruptio was present in 17 of 62 (27%). Preterm delivery (<37 weeks gestational age) occurred in 29 of 62 (47%). Genotyping of the abruptio placentae cases and controls was successful in 100% (246/246) of subjects for MTHFD1 1958G>A (R653Q), 99.2% (244/246) for MTHFR 677C>T (A222V) and 100% (246/246) for MTHFR 1298A>C (E429A). The allele and genotype frequencies and comparisons for each polymorphism are shown in Table I. Although several models were tested for each polymorphism, only the best fitting model (largest goodness of fit (G2) P-value) is shown in Table I.

The ‘Q’ allele of the MTHFD1 1958G>A (R653Q) polymorphism was more common in severe abruptio placentae cases than in controls due to an increase in ‘QQ’ homozygotes among cases (Table I). Thus, pregnant women who are homozygous for the ‘Q’ allele (‘QQ’) have a greater risk of developing severe abruptio placentae during their pregnancy (odds ratio 2.85 (1.47-5.53), P=0.002) compared to women who are heterozygous (‘RQ’) or homozygous wildtype (‘RR’). Among women with severe abruptio placentae, those who were ‘QQ’ homozygous were not significantly more likely than women who were ‘RR’ homozygous wildtype or heterozygous to have hypertension, pre-term deliveries, or to require transfusions. However, an effect may have been missed due to the small number of individuals within each subgroup. The allele frequencies in the controls are similar to those previously reported in the Dutch [Hol et al., Clin. Genet., 53:119-125 (1998)] and Turkish [Akar et al., Acta. Haematol., 102:199-200 (1999)] populations and in previously published Irish control population [Brody et al., Am. J. Hum. Genet., 71:1207-1215 (2002)].

TABLE I Comparison of MTHFD1 1958G > A (R653Q), MTHFR 677C > T and MTHFR 1298A > C Polymorphisms in placental abruption. Genotypes Alleles MTHFD1 R653Q ‘RR’ ‘RQ’ ‘QQ’ ‘R’ ‘Q’ Abruptio Placentae  18 (.29)¹ 23 (.37) 21 (.34)  59 (.48)  65 (.52) Controls 60 (.33) 96 (.52) 28 (.15) 216 (.59) 152 (.41) ‘Q’ vs. ‘R’ Odds Ratio 1.57 (1.01²-2.44³), P = 0.047⁴ ‘QQ’ vs. ‘RQ’/‘RR’ Odds Ratio 2.85 (1.47-5.53), P = 0.002⁵ MTHFR 677C > T CC CT TT C T Abruptio Placentae 26 (.42) 31 (.50)  5 (.08)  83 (.67)  41 (.33) Controls 80 (.44) 80 (.44) 22 (.12) 240 (.66) 124 (.34) T vs. C Odds Ratio 0.96 (0.63-1.44), P = 0.83 TT vs. CT/CC Odds Ratio 0.64 (0.23-1.76), P = 0.39⁶ MTHFR 1298A > C AA AC CC A C Abruptio Placentae 25 (.40) 31 (.50)  6 (.10)  81 (.65)  43 (.35) Controls 91 (.49) 75 (.41) 18 (.10) 257 (.70) 111 (.30) C vs. A Odds Ratio 1.23 (0.81-1.86), P = 0.33 CC/AC vs. AA Odds Ratio 1.45 (0.81-2.60), P = 0.21⁷ ¹Data in parentheses are allele or genotype frequencies; ²Lower limit of 95% Confidence Interval; ³Upper limit of 95% Confidence Interval; ⁴Assessed with use of chi-squared analysis; ⁵Goodness of fit statistic G2 P = 0.53; ⁶Goodness of fit statistic G2 P = 0.57; ⁷Goodness of fit statistic G2 P = 0.67.

Analysis identified the MTHFD1 G1958GA (R653Q) polymorphism as a genetic risk factor for having a pregnancy affected by severe abruptio placentae. Pregnant mothers who are ‘QQ’ homozygous have almost a tripled risk of having this pregnancy complication.

Case-control comparisons of the MTHFR polymorphisms 677C>T (A222V) and 1298A>C (E429A) did not reveal significant differences between cases and controls (Table I). The association between MTHFR 677C>T and 1298A>C was also examined and in agreement with previous MTHFR data [Parle-McDermott et al., J. Hum. Genet., 48:190-193 (2003)], there was clear evidence of linkage disequilibrium between the two polymorphisms in both cases and controls. However, analysis of combined MTHFR genotypes showed similar frequencies in cases and controls, indicating that there is no interactive effect of these MTHFR polymorphisms on risk of abruptio placentae; this finding was confirmed by the non-hierarchical logistic model analysis. Therefore, the MTHFR 677C>T (A222V) and 1298A>C (E429A) polymorphisms are in linkage disequilibrium but do not show an association with severe abruptio placentae risk in this cohort when analyzed either independently or in combination.

Combined analysis of MTHFR 677C>T (A222V) with MTHFD1 1958G>A (R653Q) genotypes by the non-hierarchical logistic model analysis also did not show any significant effects and therefore, there does not appear to be an interactive effect of these two polymorphisms and risk of severe abruptio placentae. Analysis of the MTHFR polymorphisms 677C>T (A222V) and 1298A>C (E429A) in the largest group of clinically defined severe abruptio placentae patients to date (n=62) and controls (n=182) does not support their role as genetic risk factors.

Pregnant women who are homozygous for the MTHFD1 R653Q polymorphism i.e., ‘QQ’, are almost three times more likely to develop severe abruptio placentae than pregnant women who are either heterozygous (‘RQ’) or homozygous wildtype (‘RR’) (odds ratio 2.85 (1.47-5.53), P=0.002). The possibility of fetal DNA differentially affecting the results due to fetal-maternal transfusion can be ruled out as all the blood samples were taken between 15-17 weeks gestation, prior to the diagnosis of placental abruption. Moreover, the genetic consequences of this possibility would be to increase the apparent number of heterozygotes in affected mothers.

Without being held to any particular theory of mechanism, the following theories have been contemplated. The effect of the MTHFD1 R653Q polymorphism appears to act through the ‘QQ’ homozygous genotype. Even if the MTHFD1 R653Q polymorphism does not have a direct effect on folate and homocysteine levels, this polymorphism may alter nucleotide pools available for DNA synthesis and thus affect cell division. The MTHFD1 653 ‘Q’ allele, which resides in the synthetase domain of the trifunctional enzyme, may be less efficient at DNA synthesis particularly when folate status is low. This lower efficiency may produce effects at the cellular level without causing major perturbations in plasma metabolites. Alternatively, this polymorphism may be in linkage disequilibrium with an unknown variant that alters enzyme activity.

Example 2

This study investigated whether the MTHFD1 1958G>A, MTHFR 677C>T, or TCNII C667G polymorphism influences the maternal genetic risk of second trimester pregnancy loss. Cases and controls were derived from a bank of blood samples from 56,049 pregnant women drawn during their first clinical visit at the three main Dublin maternity hospitals between 1986 and 1990. These hospitals deliver approximately 90% of births within the Dublin area as previously described [Kirke, et al., Q. J. Med., 86:703-708 (1993)]. This bank of samples is representative of a homogeneous population and due to the low level of immigration into Ireland during the collection period population stratification is unlikely to confound the performed genetic analyses. Women with a history of at least one unexplained second trimester pregnancy loss (n=125), during a previous pregnancy were identified retrospectively from the computerised records of the Coombe Women's Hospital. Individual chart reviews were then performed to confirm the details of each case. Cases were women with a previous history of spontaneous abortion or in utero fetal demise occurring spontaneously between 13 and 26 weeks gestation. Women in whom a clinical explanation for the spontaneous abortion or fetal death was apparent were excluded. Thus, women with incompetent cervix, preterm premature rupture of membranes, preterm labor, placental abruption, maternal medical disease, or fetal malformations were not included. The control group (n=625) consisted of a systematic random sample of women from the same bank. Data on parity and maternal age when the blood sample was collected was available for all cases except one and for 118/625 of the controls. Personal identifiers were removed from all samples prior to genetic testing. Appropriate ethical approval was obtained for all samples collected.

Genomic DNA was extracted from cases and controls using the QIAamp DNA Blood Mini Kit from Qiagen, UK. Genotyping of the MTHFR 677C>T and MTHFD1 1958G>A polymorphisms was performed using PCR-RFLP (Restriction Fragment Length Polymorphism) using Hinf I and Msp I respectively as previously described [Frosst et al., Nature Genet., 10:111-113 (1995); Hol et al., Clin. Genet., 53:119-125 (1998); Brody et al., Am. J. Hum. Genet., 71:1207-1215 (2002)]. The TCNII 776C>G polymorphism was genotyped using an allele-specific primer extension assay and scored by matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry (Sequenom, San Diego). Appropriate controls were included in all assays and genotyping consistency was tested by analyzing between 10-15% of samples in duplicate, resulting in 100% agreement.

The PCR conditions used for the experiment are set out in part in Table II. The reactions were set up on ice, and ran with thermocycle program “MTHFDRQ” using three GeneAmp PCR 9700 machines. When the temperature was approx. 90° C. the machine was paused and the tray was placed inside machine and the program was allowed to run. The program “MTHFDRQ” comprises the following parameters: 95° C. 3 mins, (94° C. 30 secs, 58° C. 1 min, 72° C. 1 min)×35 cycles, 72° C. 10 mins, Hold at 15° C. The primers used were R653Q Forward Primer 5′ cactccagtgtttgtccatg 3′ (SEQ ID NO: 19) and R653Q Reverse Primer 5′ gcatcttgagagccctgac 3′ (SEQ ID NO: 20). The primer stocks were diluted as follows: 1/25 60 μl+1,440 μl water for the forward primer and 1/23 65 μl+1,435 μl water for the reverse primer.

TABLE II PCR Reactants Reagent 100 Reactions Per Reaction 10 x PCR BUFFER 500 μl 5 25 mM MgCl₂ 300 μl 3 2.5 mM dNTPs 400 μl 4 F primer 1/25 (10 pmol/μl) 250 μl 2.5 R primer 1/23 (10 pmol/μl) 250 μl 2.5 Taq (5 U/μl, Sigma) 20 μl 0.2 H₂O* 3180 μl 31.8 DNA* 1 μl + 49 μl Mix 1 μl *Adjust H₂O volume depending on how much DNA is added.

The PCR products were digested with restriction enzyme MspI as indicated in Table III. Digestions took place in 37° C. waterbath for at least 3 hours, and can also be left overnight.

TABLE III PCR Product Digest Parameters Reagent 100 Digests Per Digest Msp I (20 U/μl) 100 μl 1 μl NEB2 Buffer 300 μl 3 μl H₂O 1,100 μl 11 μl PCR product 15 μl + 15 μl Mix 15 μl

The products of the digest were with mixed with Orange G loading dye and loaded all on 1.5% agarose gel (use centipede with large combs: half a tray per gel) and allowed to run until orange G is just at the bottom of the gel. The bottom half of the gel can be stained in an ethidium bromide bath. The uncut product should be approximately 330 bp. For an AA genotype, the digest products should be approximately 267 bp and 71 base pairs. For a GG genotype, the digest product should be approximately 196 bp, 71 bp, and 55 bp. For an AG genotype, the digest product should be approximately 267 bp, 196 bp, 71 bp, and 55 bp.

The association between case-control status and genotype was examined using a number of standard odds ratios. In order to have a common approach for all analyses, a log linear model was employed. The statistical software (SAS PROC NLMIXED) allows estimation of nonlinear functions of the parameters of the model, and provides standard errors calculated using the delta method [Agresti, Categorical Data Analysis (1990)]. The parameterization of the model can easily be modified for the computation of different odds ratios. This approach enabled estimation of log odds ratios and their standard errors for the computation of confidence intervals, as well as checking the goodness of fit of different models. Potential gene-gene interaction effects were also examined. Tests of interactive dominant or recessive effects of specific combined genotypes were performed using a series of non-hierarchical logistic regression models [Piegorsch et al., Stat. Med. 13:153-162 (1994)]. Statistical significance was assessed using likelihood ratio chi-square tests.

The majority of cases (116/125) had experienced just one second trimester pregnancy loss. The remaining cases experienced two (n=7) or three (n=2) second trimester pregnancy losses. The average age of the study cases was 30+/−5.23 and controls were 26.3+/−5.09 (data on just 118/625 controls). Among the case group 12% of women had a parity of 0 and 88% had a parity of 1. Among the control group where data was available, 43% had a parity of 0 and 57% had a parity of 1.

Three polymorphisms were genotyped in the second trimester pregnancy loss case (n=125) and control (n=625) groups with 98.9% of all subjects successfully genotyped for MTHFD1 1958G>A, 98.4% for MTHFR 677C>T and 97.8% for TCNII 776C>G. Comparison of allele and genotype frequencies between cases and controls is shown in Table IV.

TABLE IV COMPARISON OF MTHFD1 1958G > A, MTHFR 677C > T AND TCNII 776C > G POLYMORPHISMS IN MOTHERS WITH A HISTORY OF SECOND TRIMESTER PREGNANCY LOSS AND CONTROL MOTHERS. Genotypes Alleles MTHFD1 1958G > A GG AG AA G A Case Mothers   32 (.26)¹  58 (.47) 33 (.27) 122 (.50) 124 (.50) Control Mothers 173 (.28) 333 (.54) 113 (.18)  679 (.55) 559 (.45) A vs. G Odds Ratio 1.23 (95% CI 0.93-1.63), P = 0.14² AA vs. AG/GG Odds Ratio 1.64 (95% CI 1.05-2.57), P = 0.03³ MTHFR 677C > T CC CT TT C T Case Mothers  55 (.44)  55 (.44) 14 (.11) 165 (.67)  83 (.33) Control Mothers 271 (.44) 270 (.44) 73 (.12) 812 (.66) 416 (.34) T vs. C Odds Ratio 0.98 (95% CI 0.73-1.31), P = 0.90 TT vs. CT/CC Odds Ratio 0.94 (95% CI 0.51-1.73), P = 0.85⁴ TCNII 776C > G CC CG GG C G Case Mothers  33 (.27)  61 (.50) 29 (.24) 127 (.52) 119 (.48) Control Mothers 184 (.30) 306 (.50) 121 (.20)  674 (.55) 548 (.45) C vs. G Odds Ratio 1.15 (95% CI 0.88-1.52), P = 0.31 GG vs. CC/CG Odds Ratio 1.25 (95% CI 0.79-1.98), P = 0.34⁵ ¹Data in parentheses are allele or genotype frequencies; ²Chi-squared analysis; ³Goodness of fit statistic G2 P = 0.80; ⁴Goodness of fit statistic G2 P = 0.99; ⁵Goodness of fit statistic G2 P = 0.65

The MTHFD1 1958AA genotype is clearly enriched in the second trimester pregnancy loss case group compared to controls. MTHFD1 1958AA women have a significantly increased risk of having an unexplained second trimester pregnancy loss than women who are MTHFD1 1958AG or 1958GG (odds ratio 1.64 (1.05-2.57) P=0.03). The control group shows deviation from Hardy-Weinberg equilibrium with slightly more MTHFD1 1958AG heterozygotes than expected (P=0.03). Published frequencies from other populations including Dutch (Hol et al., Clin Genet. 53, 119-125 (1998)), Turkish (Akar et al., Thromb. Res., 102:115-120 (2001)), Italian (De Marco et al., Annual Meeting of the Society for Research into Hydrocephalus and Spina Bifida, Dublin, 23-26 (2004)) and Mexican (Shi et al., Birth Defects Res. Part A Clin. Mol. Teratol., 67:545-549 (2003)) are also skewed toward heterozygote excess although these deviations from Hardy-Weinberg equilibrium are not statistically significant.

Increased frequencies of the TCNII 776G allele (48% vs 45%) and the 776GG genotype (24% vs 20%) were observed in cases compared to controls (Table IV). Although this difference was not statistically significant, the TCNII 776C>G polymorphism cannot be completely ruled out as a risk factor for second trimester loss. Comparison of the allele and genotype frequencies of the MTHFR 677C>T polymorphism showed no difference between cases and controls. Thus, the MTHFR 677C>T polymorphism does not appear to be a significant risk factor for unexplained second trimester pregnancy loss in the Irish population.

Data was also examined for the possibility of combined genetic factors having an additive effect on risk of second trimester loss. The following genotype combinations for the possibility of an interactive effect: MTHFD1 1958AA and MTHFR 677TT (OR 1.25, P=0.75), MTHFD1 1958AA and TCNII 776GG (OR 1.20, P=0.75) or 776CG/GG (OR 1.16, P=0.77), MTHFR 677TT and TCNII 776GG (OR 0.81, P=0.78) or 776CG/GG (OR 0.70, P=0.59). The results of these analyses show no significant genotype interactive effects on the risk of second trimester pregnancy loss.

While there has been some evidence to support a role of the MTHFR 677C>T polymorphism as both a maternal and fetal genetic risk factor for early pregnancy loss [Reviewed in Zetterberg et al., Reprod. Biol. Endocrinol., 2:7 (2004)], analysis of the MTHFR 677C>T polymorphism in the second trimester cohort showed no evidence of an association.

The results indicate that the maternal TCNII 776C>G genotype does not independently contribute to risk of second trimester pregnancy loss. Although the 776GG genotype showed an increased frequency in the second trimester case group compared to controls (24% vs 20%), this result was not statistically significant.

Although the variants in MTHFR and TCNII were not found to be independent maternal risk factors, each may contribute to second trimester loss in combination with some other factor. For example, an interactive effect between TCNII 776CG or 776GG and MTHFR 677TT on early fetal loss has been reported [Zetterberg et al., Hum. Reprod., 18:1948-1950 (2003)]. While an odds ratio comparison showed that these genotype combinations were significantly higher in spontaneously aborted fetuses, a statistical method that differentiates between independent and interactive effects would have tested more effectively whether these polymorphisms act synergistically. Logistic regression analysis was applied to this data [reconstructed from Zetterberg et al., Eur. J. Hum. Genet., 10:113-118 (2002), Zetterberg et al., Hum. Reprod., 17:3033-3036 (2002), and Zetterberg et al., Hum. Reprod., 18:1948-1950 (2003)]. This method confirmed that each polymorphism acts as an independent risk factor (TCNII 776CG or 776GG, P=0.0006; MTHFR 677TT P=0.033) but the interaction between MTHFR 677TT and TCNII 776CG/GG was not significant (P=0.77). Similarly, no evidence was found in the analysis of second trimester pregnancy loss cases and controls for interactive effects between the MTHFR 677TT and TCNII 776GG (P=0.78) or TCNII 776CG/GG (P=0.59).

The study did consider maternal age and the mean age among cases was 30 years, well under the threshold (35+ years) at which substantially increased complications related to maternal age are expected [Cunningham and Leveno, N. Engl. J. Med., 333:1002-1004 (1995)]. All losses with fetal malformations were excluded. Even if miscarriages due to unrecognized NTDs were present in the study, such miscarriages were unlikely to have had a significant impact on the analyses as the rate of NTD associated pregnancy losses is 1/50.

This experimental study has identified the MTHFD1 1958AA genotype as an independent maternal risk factor for unexplained pregnancy loss during the second trimester of pregnancy. Analyses of the MTHFR 677C>T and TCNII 776C>G polymorphisms did not indicate that these variants either independently or in combination had any significant affect on risk of pregnancy loss.

Example 3

The second trimester study of Example 2 is repeated, but also gathering data for maternal risk factors such as tobacco or alcohol use, which contribute to fetal loss. In addition, prenatal diagnosis and routine ultrasound can be performed. Genetic testing as described above for MTHFD1, MTHFD1L, etc. is carried out. These genetic test results can be combined with the risk associated with alcohol or tabacco use. The resulting risk estimate provides greater accuracy than those based on genetic testing or environmental exposure measurements alone.

Example 4

The second trimester study of Example 2 is repeated, but with further testing for inherited or acquired thrombophilia. This testing involves testing for the polymorphic variants described herein in respect to F2 and F5. By testing for multiple risk factors, one can achieve greater predictive value.

Example 5

The results in Example 2 showed significantly more 1958AG heterozygotes in the general population than expected and the apparent selection against transmission of the 1958A allele in the earlier MTHFD1 NTD study suggested that the 1958G>A polymorphism in the fetus may also have a role in fetal loss. The second trimester study is repeated with the spontaneously aborted embryos/fetuses tested for the MTHFD1 1958G>A polymorphism with the tentative prediction that more than expected would carry the 1958AA genotype.

Example 6

The following study revealed a correlation between neural tube defects and a particular variant of the rs3832406 polymorphism of MTHFD1L is predictative of an increased susceptibility for a having a child with a neural tube defect.

The study group consisted of NTD-affected children plus their parents (triads) who were recruited throughout Ireland from 1993 to date with the assistance of various branches of the Irish Association for Spina Bifida and Hydrocephalus. The NTD population comprised 387 NTD cases, 349 fathers of NTD cases and 386 mothers of NTD cases. The control population (n=280) was obtained from between 1986 and 1990 from 56,049 pregnant women attending the three main maternity hospitals in the Dublin area. Details of this collection have been described previously [Kirke, et al., Q. J. Med., 86:703-708 (1993)]. Informed consent and ethical approval were obtained for all samples collected.

Extraction of genomic DNA was carried out using the QIAamp DNA Blood Mini Kit, Qiagen, UK. Genotyping of the MTHFD1L intron 7 deletion insertion polymorphism, rs3832406, was carried out under the conditions outlined below. The sequences for the PCR primers were as follows: MTHFD1L.F 5′* TTCTCTTTCTTAGCCCCACG 3′ (SEQ ID NO: 21) and MTHFD1L.R 5′ AGAGCTTGCAGTGAGCCTAGA 3′ (SEQ ID NO: 22)*6-FAM (BLUE) LABEL. An ABI GeneAmp PCR system 9700 was used for the thermocycling using the following program conditions: 94° C. 3 mins, (94° C. 30 secs, 60° C. 30 secs, 72° C. 30 secs)×35 cycles, 72° C. 5 mins. PCR reactant parameters are provided in Table V.

TABLE V MTHFD1L PCR REACTANTS Reagent 100 Reactions Per Reaction 10 x PCR BUFFER 250 μl 2.5 25 mM MgCl₂ 150 μl 1.5 2.5 mM dNTPs 200 μl 2 F primer 1/90 (10 pmol/μl) 200 μl 2 R primer 1/20 (10 pmol/μl) 50 μl 0.5 Taq (5 U/μl, Sigma) 10 μl 0.1 H₂O* 1390 μl 13.9 DNA* 2.5 μl + 22.5 μl Mix

PCR products were resolved on a 6% denaturing polyacrylamide gel on an ABI 377 DNA sequencer and sized using the Genescan software. Genotypes were analysed using the Genotyper software. Analysis of the transmission of alleles from parents to affected NTD case was performed using an extended transmission disequilibrium test as described by Sham and Curtis, Ann. Hum. Genet., 59(Pt 3):323-36 (1995), using the ETDT software. Allele and genotype frequencies were compared between NTD groups and controls and statistical significance was assessed by chi-squared analysis. The allele and genotype frequencies for MTHFD1L intron 7 deletion insertion polymorphism, rs3832406, are shown in Table VI. The alleles are represented by the following numbers: Allele 1=7 x ATT; Allele 2=8 x ATT; Allele 3=9 x ATT.

TABLE VI Allele and Genotype Frequencies in NTD Groups and Controls Cases Fathers Mothers Controls Genotypes 1-1 184 (.49) 134 (.39) 164 (.44) 107 (.39) 1-2 74 (.20) 91 (.27) 84 (.23) 75 (.28) 1-3 69 (.19) 66 (.19) 75 (.20) 58 (.21) 2-2 10 (.03) 14 (.04) 10 (.03) 14 (.05) 2-3 23 (.06) 16 (.05) 22 (.06) 10 (.04) 3-3 12 (.03) 19 (.06) 18 (.05) 8 (.03) Total 387 (96.1%) 349 (97.4%) 386 (96.6%) 280 (97.1%) H-W (2df) P = 0.044 P = 0.004 P = 0.080 P = 0.102 Alleles 1 511 (.69) 425 (.63) 487 (.65) 347 (.64) 2 117 (.16) 135 (.20) 126 (.17) 113 (.21) 3 116 (.16) 120 (.18) 133 (.18) 84 (.15)

Comparison of cases to controls showed that the “1-1” genotype appears to be associated with increased risk of an NTD. In contrast, the “2” allele appears to be protective. The ETDT test confirmed the case versus control comparisons and showed over transmission of the “1” allele from parents to affected offspring, while the “2” allele showed under transmission. A summary of this analysis is shown in Table VII.

TABLE VII Case Vs Control Comparisons Allele 1 Vs 2/3; OR 0.80 (0.64-1.01) P = 0.066 2 Vs 1/3; OR 1.41 (1.06-1.87) P = 0.020 3 Vs 1/2; OR 0.99 (0.73-1.34) P = 0.941 Genotypes 1-1 Vs the Rest; OR 1.51 (1.10-2.07) P = 0.011 2-2/1-2/2-3 Vs the Rest; OR 0.71 (0.51-0.99) P = 0.041 3-3 Vs the Rest; OR 1.10 (0.44-2.73) P = 0.837 If ignore Allele 3: 1-1 Vs 1-2/2-2 OR 1.83 (1.25-2.68) P = 0.002 1-1 genotype = Risk 1-2 or 2-2 = Protective Other genotypes = no effect

Logistic Regression TDT was performed using extended transmission disequilibrium test-Sham and Curtis 1995 Software ETDT, supra, results were as follows: Chi-squared for allele-wise TDT=2*(L1−L0)=9.496, 2df, P=0.0087. Chi-squared for genotype-wise TDT 2*(L2−L0)=10.887, 3df, P=0.0124. Chi-squared for goodness of fit of allele-wise model=2*(L2−L1)=1.391, 1df, P=0.238. L0=Log likelihood that there is a probability of equal transmission, i.e., null hypothesis. L1=Alternative hypothesis that transmission probabilities are determined in an allele specific way. L2=Transmission probabilities may be independent for each genotype, that is, alleles are transmitted in a genotype specific fashion.

A summary of transmissions from all heterozygous parents is provided in Table VIII; maternal and paternal results are displayed in Tables IX and X respectively.

TABLE VIII Summary of Transmissions from All Heterozygous Parents Allele 1 Allele 2 Allele 3 Passed 137 (58%) 63 (38%) 66 (51%) Not Passed: 100 (42%) 102 (62%) 64 (49%) Chi-Squared (1df): 5.776 9.218 0.031 P-values$: 0.0163 0.0024 0.8608 $these values can be corrected for multiple testing.

TABLE IX Maternal Transmissions only Allele 1 Allele 2 Allele 3 Passed 65 (61%) 28 (41%) 27 (42%) Not Passed: 41 (39%) 41 (59%) 38 (58%) Chi-Squared (1df): 5.434 2.449 1.862 P-values$: 0.0198 0.1176 0.1725 Chi-squared for allele-wise TDT = 2* (L1 − L0) = 5.489, 2 df, P = 0.064 Chi-squared for genotype-wise TDT 2* (L2 − L0) = 7.115, 3df, P = 0.068 Chi-squared for goodness of fit of allele-wise model = 2* (L2 − L1) = 1.627, 1df, P = 0.202 $these values can be corrected for multiple testing.

TABLE X Paternal Transmissions only Allele 1 Allele 2 Allele 3 Passed 63 (56%) 30 (35%) 35 (61%) Not Passed: 50 (44%) 56 (65%) 22 (39%) Chi-Squared (1df): 1.496 7.860 2.965 P-values$: 0.2214 0.0051 0.0852 Chi-squared for allele-wise TDT = 2* (L1 − L0) = 9.341, 2 df, P = 0.009 Chi-squared for genotype-wise TDT 2* (L2 − L0) = 9.404, 3df, P = 0.024 Chi-squared for goodness of fit of allele-wise model = 2* (L2 − L1) = 0.062, 1df, P = 0.802 $these values can be corrected for multiple testing.

The 1-1 genotype appears to be a risk for NTD cases. Preferential transmission of allele 1 is observed in the TDT. Having at least one copy of allele 2 appears to protect against NTDs i.e., 1-2, 2-2 or 2-3 genotypes. The TDT shows that allele 2 is transmitted significantly less than expected. Allele 3 appears to have no effect on risk of NTDs. The fathers and NTD cases are significantly out of Hardy-Weinberg equilibrium, presumably this situation is driven by the case genotypes.

Example 7

The hypothesis being tested in the following series of experiments is that polymorphism rs3832406 within the MTHFD1L gene affects the splicing efficiency of the alternative transcript and could ultimately impact on the level of mitochondrial 10-formyltetrahydrofolate synthase.

Confirmation of the Alternatively Spliced Transcript

Total RNA was extracted from transformed lymphoblast cell lines using Ultraspec™ II (Biotex, Houston, USA). These cell lines were obtained from the Coriell Cell Repository, having been transformed by culturing primary lymphocytes with Epstein-Barr Virus (EBV). RNA from five cell lines was pooled, although pooling need not be carried out for this experiment. These five cell lines and their genotypes were 15083 (7ATT/7ATT), 17102 (7ATT/7ATT), 17133 (7ATT/8ATT), 17219 (7ATT/7ATT), and 17259 (7ATT/8ATT). Dnasel (Invitrogen) treated RNA (1 μg) was reverse transcribed using Superscript II (Invitrogen) as described by the manufacturer. PCR primers were designed to amplify both transcripts (Table XI), the 1.1 kb transcript only or the 3.6 kb transcript only (Table XI). The results of this experiment confirm the presence of both transcripts that are specific to the MTHFD1L gene.

TABLE XI Primer Sequence Details for RT-PCR Assays Primer Sequences mRNA PCR Temp. Forward (SEQ ID NO: 23): 1.1 kb and 56° C. CCATCGTCAGAGAAGTCATTCA 3.6 kb Reverse (SEQ ID NO: 24): CTGGTTGATTTCCTGCATCA Forward (SEQ ID NO: 25) 1.1 kb only 58° C. GGTCTTTGGAAGCTGCTCTACA Reverse (SEQ ID NO: 26): TTGCAGTGAGCCTAGATCACG Forward (SEQ ID NO: 27): 3.6 kb only 58° C. GATCACACCCACCCCTCTTG Reverse (SEQ ID NO: 28): CCTCCTTTCACTCCAAACGTC Determination of mRNA Levels

Taqman assays are performed to examine the levels of MTHFD1L mRNA in relation to the rs3832406 polymorphism. Lymphoblast Coriell cell lines that are representative of rs3832406 genotypes have been identified. Total RNA is extracted and DnaseI treated as described above. Taqman® assays have been designed to distinguish the expression level of the long and short transcripts of MTHFD1L. A control assay that detects both transcripts is localized between Exons 1 and 2 of the MTHFD1L mRNA transcript (Applied Biosystems (ABI) assay ID Hs_(—)00920574). A second assay detects the longer transcript only and is localized to Exons 19/20 (ABI assay ID Hs_(—)003836161). A third assay has been custom designed by ABI and is localized to Exons 7/8A. These assays will be used to examine the relative expression levels of both transcripts to determine if there are differences that are correlated with rs3832406 genotype.

Folate/Homocysteine Levels

A correlation between the rs3832406 polymorphism and folate/homocysteine levels is determined. A collection of DNA samples where folate and homocysteine levels have already been assayed are genotyped for the rs3832406 polymorphism using the procedures described herein. A correlation may be found between genotype and folate/homocysteine levels. As folate and homocysteine levels may predict vascular disease and cancer risk, genotypes at rs3832406 may prove useful in estimating the risk for these diseases.

Example 8

The objective of these experiments is to determine if a polymorphism in MTHFD1L, for example, rs3832406, has an effect on the efficacy or proper dosage for a chemotherapeutic drug such as 5-fluorouracil (5-FU), and more generally whether a particular variant has an effect on the metabolic pathways that affect 5-FU/folinic acid (FA) action. Variable response of patients to administration of 5-FU or other drugs relevant to folate metabolism, or administration of the specific drugs can be used in identifying polymorphic variants responsible for such variable response. As described above, those polymorphic variants can then be used in diagnostic tests and methods of treatment.

5-fluorouracil (5-FU) is a widely used chemotherapy drug. The effectiveness of 5-FU is potentiated by folinic acid (FA; generic name: leukovorin). The combination of 5-FU and FA is standard therapy for stage III/IV colon cancer. 5-FU is used in the standard treatment of gastrointestinal such as colorectal, breast and head and neck cancers. Clinical trials have also shown responses in cancer of the bladder, ovary, cervix, prostate and pancreas. Patient responses to 5-FU and 5-FU/FA vary widely, ranging from complete remission of cancer to severe toxicity.

This study compares the variance frequency distribution in the MTHFD1L rs3832406 polymorphism between groups of patients with solid tumors, treated by weekly or monthly regimen of 5-FU+FA and defined by level of toxicity (graded according to the NCl common toxicity criteria) as: Group 1: patients with high toxicity (grade III/IV on NCl criteria) Group 2: patients with minimal toxicity (grade 0/I/II on NCl criteria). This study helps determine whether the seven, eight, nine, or other multiple “ATT” repeat polymorphic variant affects the efficacy of the 5-FU+FA regemin, and can be readily adapted to test other drug regemins as well. The groups differ in the degree of toxicity experienced with treatment, if any: patients with high toxicity (grade III/IV on NCl criteria), and patients with minimal toxicity (grade 0/I/II on NCl criteria). Analyses are performed globally, then by regimen (monthly vs. weekly) and by type of toxicity (gastrointestinal vs. bone marrow). The statistical significance of the differences between polymorphic variant frequencies can be assessed by a Pearson chi-squared test of homogeneity of proportions with n−1 degrees of freedom.

In one embodiment, the number of subjects in the study is as follows: about 50-100 patients to each group. However, prior to testing to identify the presence of sequence polymorphic variants in a particular gene or genes, it is useful to understand how many individuals should be screened to provide confidence that most or nearly all pharmacogenetically relevant polymorphic variants will be found. The answer depends on the frequencies of the phenotypes of interest and what assumptions were made about heterogeneity and magnitude of genetic effects. At the beginning, only known phenotype frequencies, for example, responders vs. no responders, frequency of various side effects, etc., are known. The occurrence of serious 5-FU/FA toxicity, for example, toxicity requiring hospitalization is often>10%. The occurrence of life threatening toxicity is in the 1-3% range [Buroker et al., J. Clinical Oncology, 12:14-20 (1994)]. The occurrence of complete remissions is on the order of 2-8%. The lowest frequency phenotypes are about 2%.

In one embodiment, if homogeneous genetic effects are responsible for half the phenotypes of interest and for the most part the extreme phenotypes represent recessive genotypes, then one should detect alleles that will be present at about 10% frequency (0.1.x0.1=0.01, or 1% frequency of homozygotes) if the population is at Hardy-Weinberg equilibrium. To have an about 99% chance of identifying such alleles would involve searching a population of 22 individuals. If the major phenotypes are associated with heterozygous genotypes then alleles present at about 0.5% frequency (2×0.005×0.995=0.00995, or about 1% frequency of heterozygotes) should be detected. A 99% chance of detecting such alleles would involve about 40 individuals. Given the heterogeneity of the North American and other populations, one should not necessarily assume that all genotypes are present in Hardy-Weinberg proportions; a substantial oversampling is performed to increase the chances of detecting relevant polymorphic variants: For initial screening, 50-100 individuals of known race/ethnicity can be screened for polymorphic variant. Polymorphic variant detection studies can be extended to outliers for the phenotypes of interest to cover the possibility that important polymorphic variants were missed in the normal population screening.

Two major dosing regimens can be used: 5-FU plus low dose FA given for five consecutive days followed by a 23 day interval, or once weekly bolus intravenous 5-FU plus high dose FA. The higher FA dose results in plasma FA concentrations of 1 to 10 μM, comparable to those used for optimal 5-FU/FA synergy in tissue culture, however low dose FA (20 mg/m² vs. 500 mg/m²) has produced comparable clinical benefit.

Leukovorin (folinic acid) is the most widely used 5-FU modulator, however a variety of other molecules have been used with 5-FU, including, for example, interferon-alpha, hydroxyurea, N-phosphonacetyl-L-aspartate, dipyridamole, levamisole, methotrexate, trimetrexate glucuronate, cisplatin and radiotherapy. S-1 is a novel oral anticancer drug, composed of the 5-FU prodrug tegafur plus gimestat (CDHP) and otastat potassium (Oxo) in a molar ratio of 1:0.4:1, with CDHP inhibiting dihydropyrimidine dehydrogenase in order to prolong 5-FU concentrations in blood and tumour and Oxo present as a gastrointestinal protectant. The experimental study can be carried out with one of these modulator in addition to 5-FU.

5-FU toxicity has been well documented in randomized clinical trials. Accordingly, during the course of the experimental study, participants are monitored for such toxicities. Patients receiving 5-FU/FA are at even greater risk of toxic reactions and should be monitored carefully during therapy. A variety of side effects have been observed, affecting the gastrointestinal tract, bone marrow, heart and CNS. The most common toxic reactions are nausea and anorexia, which can be followed by life threatening mucositis, enteritis and diarrhea. Leukopenia and stomatitis is also a problem in some patients, particularly with the weekly dosage regimen. Toxicity is a major cost of 5-FU/FA therapy, measured both in patient suffering and in financial terms (the cost of care for drug induced illness).

Many non-genetic factors can influence the response of cancers to drugs, including tumor location, vasculature, cell growth fraction and various drug resistance mechanisms. Accordingly, in performing the drug trial, these non-polymorphic variables are controlled for by selecting participants with common attributes.

There are many potential candidate therapeutic interventions or drugs that can affect the folate and pyrimidine pathways. Categories of these are 5-FU prodrugs, drugs that affect DNA methylation pathways, and other drugs that have been developed for similar indications as 5-FU. The study can be performed using one of these drug in the alternative or in addition to 5-FU. 5-FU prodrugs are generally modified fluoropyrimidines that require one or more enzymatic activation steps for conversion into 5-FU. The activation steps may result in prolonged drug half-life and/or selective drug activation (i.e. conversion to 5-FU) in tumor cells. Examples of such drugs include capecitabine (Xeloda, Roche), a drug that is converted to 5-FU by a three-step pathway involving carboxylesterase 1, cytidine deaminase and thymidine phosphorylase. Another 5-FU prodrug is 5′ deoxy 5-FU (Furtulon, Roche), which is converted to 5-FU by thymidine phosphorylase and/or uridine phosphorylase. Another 5-FU prodrug is 1-(tetrahydro-2-furanyl)-5-fluorouracil (FT, ftorafur, Tegafur, Taiho—Bristol Myers Squibb), a prodrug that is converted to 5-FU by cytochrome P450 enzyme, CYP3A4. In some embodiments, drugs acting on DNA methyation pathways are substituted or used in combination with 5-FU.

A variety of drugs are being developed for similar indications as 5-FU, and/or are being tested in combinations with 5-FU/leukovorin. These drugs can be substituted or used in combination with 5-FU in this study. Identification of patients likely to respond to 5-FU with or without leukovorin, may be useful in selecting optimal responders to other drugs. Alternatively, identification of patients likely to suffer toxic response to 5-FU containing regimens can allow identification of patients best treated with other drugs. Other drugs with activity against cancers usually treated with regimens containing 5-FU or in the alternative include the platinum compound oxaliplatin (L-OHP), the topoisomerase I inhibitors irinotecan (CPTI 1, Pharmacia-UpJohn) and topotecan, Suramin, a bis-hexasulfonated napthylurea; 6-hydroxymethylacylfulvene (HMAF; MGI 114); LY295501; bizelesin (U-7779; NSC615291), ONYX-015, monoclonal antibodies, for example, 17-IA and MN-14, protein synthesis inhibitors such as RA 700, angiogenesis inhibitors such as PF 4, and cyclooxygenase inhibitors. Additional drugs that can be substituted for or used in combination with 5-FU in accordance with this study include the following: quinazoline derivatives such as ZD1694 (Tomudex, AstraZeneca); ZD9331 (AstraZeneca); LY231514 (Eli Lilly); GW1843 (1843U89, GlaxoWellcome); AG337; and AG331; trimetrexate (US Bioscience); edatrexate, piritrexim; and lometrexol. More generally, 5,8-dideazaisofolic acid (LAHQ), 5,10-dideazatetrahydrofolic acid (DDATHF), and 5-deazafolic acid are structures into which a variety of modifications have been introduced in the pteridine/quinazoline ring, the C9-N10 bridge, the benzoyl ring, and the glutamate side chain (see article below). Other drugs include 2,4-diaminopyrido[2,3-d]pyrimidine based antifolates.

Example 9

The experimental study described in Example 8 is repeated using a relevant cardiovascular drug. This study and similar studies are helpful in improving therapies for atherosclerosis, thromboembolic diseases and other forms of vascular and heart disease. Homocysteine is a proven risk factor for cardiovascular disease. One important role of the folate cofactor 5-methyltetrahydrofolate is the provision of a methyl group for the remethylation of homocysteine to methionine by the enzyme methionine synthase. Variation in the enzymes of folate metabolism, for example methionine synthase or methylenetetrahydrofolate reductase (MTHFR), may affect the levels of 5-methyltetrahydrofolate or other folates that in turn influence homocysteine levels. The contribution of elevated homocysteine to atherosclerosis, thromboembolic disease and other forms of vascular and heart disease may vary from one patient to another. Such variation may be attributable, at least in part, to genetically determined variation in the levels or function of the enzymes of folate and one carbon metabolism described in this application. Understanding which patients are most likely to benefit from particular drugs assists in the clinical development or use of drugs to treat cardiovascular diseases. Such drugs include those aimed at the modulation of folate levels, for example, supplemental folate, and at other known causes of cardiovascular disease, for example, lipid lowering drugs such as statins, or antithrombotic drugs such as salicylates, heparin or GPIIIa/IIb inhibitors. In some embodiments, patients are included whose disease is significantly attributable to elevated homocysteine from treatment with agents aimed at the amelioration of other etiological causes, such as elevated cholesterol.

Example 10

The experimental study described in Example 8 is repeated using a relevant central nervous system (CNS) drug. Phencyclidine, an NMDA receptor antagonist, has been shown to induce a psychotic state closely resembling schizophrenia in normal individuals has led to attempts to modulate NMDA receptor function in schizophrenic patients. The amino acid glycine is an obligatory coagonist, with glutamate, at NMDA receptors via its action at a strychnine-insensitive binding site on the NMDA receptor complex, and consequently glycine or glycinergic agents, for example, glycine, the glycine receptor partial agonist, D-cycloserine, or the glycine prodrug milacemide, have been tried as an adjunct to conventional antipsychotics for the treatment of schizophrenia. Several trials have demonstrated a moderate improvement in negative symptoms of schizophrenia. Because the folate pathway modulates levels of serine and glycine, the endogenous levels of glycine in neurons may affect the response to glycine or glycinergic drugs. CNS drugs can also include drugs for treatment or prevention of Alzheimer's disease or other dementia.

All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.

The use of the terms “a” and “an” and “the” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.

Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context. 

1.-2. (canceled)
 3. A method of testing for an increased susceptibility for a complication related to a defect in a one-carbon metabolic pathway, the method comprising: (a) screening a sample from a subject to detect the presence or absence of a polymorphic variant of a polymorphism in at least one chromosomal copy of the MTHFD1L gene, wherein the polymorphic variant is associated with an increased susceptibility for a complication related to a defect in a one-carbon metabolic pathway; and (b) diagnosing the susceptibility of the subject for a complication related to a defect in a one-carbon metabolic pathway based on the presence or absence of the polymorphic variant of at least one chromosomal copy of the MTHFD1L gene.
 4. The method of claim 3, wherein the complication is related to administration of a drug.
 5. The method of claim 4, wherein the drug is selected from the group consisting of a chemotherapeutic drug, a cardiovascular drug, and a central nervous system (CNS) drug.
 6. The method of claim 3, wherein the complication is selected from the group consisting of a pregnancy-related complication, miscarriage, second trimester miscarriage, placental abruption, severe placental abruption, neural tube defect, and cardiovascular disease.
 7. The method of claim 6, wherein the complication is a neural tube defect, and wherein the neural tube defect is in a child of the subject screened.
 8. The method of claim 7, wherein the neural tube defect is selected from the group consisting of anencephaly, encephalocele, iniencephaly, and spina bifida.
 9. The method of claim 3, wherein the subject is selected from the group consisting of a female, a female of child-bearing age, a pregnant female, a female that has had complications during a previous pregnancy, a female that has had complication becoming pregnant, a male, and a gestating child.
 10. The method of claim 3, wherein the sample comprises an egg, a sperm, a somatic cell, or blood.
 11. The method of claim 3, wherein the polymorphism variant is selected from the group consisting of a single nucleotide polymorphism (SNP) and a short tandem repeat polymorphism (STRP).
 12. The method of claim 3, wherein the polymorphic variant is present in a single chromosomal copy of the gene, and wherein heterozygosity is associated with an increased risk for the complication.
 13. The method of claim 3, wherein the sample comprises two chromosomal copies of the gene, wherein the polymorphic variant is present in both chromosomal copies of the gene, wherein homozygosity of the polymorphic variant is associated with an increased risk for the complication, and wherein the complication is diagnosed if homozygosity of the polymorphic variant is detected.
 14. The method of claim 3, wherein the sample comprises a nucleic acid selected from the group consisting of (a) a nucleic acid encoding MTHFD1L, (b) a fragment of (a) comprising at least 30 contiguous nucleotides of SEQ ID NO: 12 wherein the 30 contiguous nucleotides comprise the polymorphism, (c) a complement of (a) or (b), and (d) a combination of two or more of (a), (b), and (c).
 15. The method of claim 3, wherein the screening comprises assaying a sample comprising a nucleic acid selected from the group consisting of (a) a nucleic acid encoding MTHFD1L, (b) a fragment of (a) comprising at least 30 contiguous nucleotides of SEQ ID NO: 12 wherein the 30 contiguous nucleotides comprise the tandem repeated “ATT” sequence of the rs3832406 polymorphism of MTHFD1L, (c) a complement of (a) or (b), and (d) a combination of two or more of (a), (b), and (c).
 16. The method of claim 15, wherein the polymorphic variant is an “ATT” tandem repeat of rs3832406, and the diagnosis is for an increased susceptibility for the condition if the tandem repeat comprises seven or fewer “ATT” repeats.
 17. The method of claim 15, wherein the polymorphic variant causes an alternative splicing of a mRNA derived from the MTHFD1 L gene or a decrease in the synthetase activity of the MTHFD1 L gene product.
 18. The method of claim 3, wherein the sample is further screened for a polymorphic variant of a polymorphism in a gene selected from the group consisting of the MTHFR gene, the coagulation factor II gene, the coagulation factor V gene, and the transcobabalamin II gene, and wherein the polymorphic variant is associated with the complication.
 19. The method of claim 3, wherein the sample is screened using a method selected from the group consisting of a nucleic acid array, allele-specific-oligonucleotide (ASO) hybridization, PCR-RFLP analysis, PCR, single-strand conformation polymorphic variant (SSCP) technique, an amplification refractory mutation system (ARMS) technique, nucleotide sequencing, an antibody specific to the protein encoded by the polymorphic variant containing gene, and mass spectrometry. 20.-23. (canceled) 